any suggestions how to create with python 3 a multi scraper and crawling both in the surface and in the onion?
i using https://www.edureka.co/blog/web-scraping-with-python/
i using https://www.edureka.co/blog/web-scraping-with-python/
import requests
session = requests.session()
session.proxies = {
'http':'socks5://127.0.0.1:9050',
'https': 'socks5://127.0.0.1:9050'
}
r = session.get('http://httpbin.org/ip')
print(r.text)
если ты хочешь работать с tor сайтами, используй tor прокси.
Код:import requests session = requests.session() session.proxies = { 'http':'socks5://127.0.0.1:9050', 'https': 'socks5://127.0.0.1:9050' } r = session.get('http://httpbin.org/ip') print(r.text)
Using the scheme socks5 causes the DNS resolution to happen on the client, rather than on the proxy server. This is in line with curl, which uses the scheme to decide whether to do the DNS resolution on the client or proxy. If you want to resolve the domains on the proxy server, use socks5h as the scheme.
proxies = {
'http': 'socks5h://127.0.0.1:9150',
'https': 'socks5h://127.0.0.1:9150'
}
response = requests.get(url, proxies=proxies)
Here is an example on github for a hidden service scraper. Didn't test it out, not sure if it works.any suggestions how to create with python 3 a multi scraper and crawling both in the surface and in the onion?
i using https://www.edureka.co/blog/web-scraping-with-python/
In theory, yes. But you would need to completely reverse-engineer what the JS Code does, which would take a long time depending on the level of protection. I know that people have reverse engineered Cloudflares in the past, but I think they've since changed challenges.I take this opportunity to ask you all: is there a way to bypass client verification (JavaScript based) without the use of Selenium?
The .js files should be rendered and executed by the requests module.
Thanks in advice