• XSS.stack #1 – первый литературный журнал от юзеров форума

Scraper crawling

если ты хочешь работать с tor сайтами, используй tor прокси.

Код:
import requests

session = requests.session()
session.proxies = {
    'http':'socks5://127.0.0.1:9050',
    'https': 'socks5://127.0.0.1:9050'
    }


r = session.get('http://httpbin.org/ip')
print(r.text)
 
Пожалуйста, обратите внимание, что пользователь заблокирован
если ты хочешь работать с tor сайтами, используй tor прокси.

Код:
import requests

session = requests.session()
session.proxies = {
    'http':'socks5://127.0.0.1:9050',
    'https': 'socks5://127.0.0.1:9050'
    }


r = session.get('http://httpbin.org/ip')
print(r.text)
Using the scheme socks5 causes the DNS resolution to happen on the client, rather than on the proxy server. This is in line with curl, which uses the scheme to decide whether to do the DNS resolution on the client or proxy. If you want to resolve the domains on the proxy server, use socks5h as the scheme.
Python:
proxies = {
    'http':     'socks5h://127.0.0.1:9150',
    'https':    'socks5h://127.0.0.1:9150'
}
response = requests.get(url, proxies=proxies)
 
As others have already mentioned, simply routing requests through Tor is relatively easy. But if your goal is to access clearnet content this way, beware that many attempt to prevent scraping, or only present certain content through complex JS to make scraping more difficult.

But this is actually pretty easy to work around using Selenium. If this will be relevant for you, let me know and I'd be happy to guide you more specifically through scraping sites with Selenium + Tor
 
Пожалуйста, обратите внимание, что пользователь заблокирован
any suggestions how to create with python 3 a multi scraper and crawling both in the surface and in the onion?
i using https://www.edureka.co/blog/web-scraping-with-python/
Here is an example on github for a hidden service scraper. Didn't test it out, not sure if it works.

 
I take this opportunity to ask you all: is there a way to bypass client verification (JavaScript based) without the use of Selenium?
The .js files should be rendered and executed by the requests module.

Thanks in advice
 
Пожалуйста, обратите внимание, что пользователь заблокирован
I take this opportunity to ask you all: is there a way to bypass client verification (JavaScript based) without the use of Selenium?
The .js files should be rendered and executed by the requests module.

Thanks in advice
In theory, yes. But you would need to completely reverse-engineer what the JS Code does, which would take a long time depending on the level of protection. I know that people have reverse engineered Cloudflares in the past, but I think they've since changed challenges.
 


Напишите ответ...
  • Вставить:
Прикрепить файлы
Верх