• XSS.stack #1 – первый литературный журнал от юзеров форума

TXT scrape of pastes on intelx.io

pompompurin

HDD-drive
Забанен
Регистрация
18.03.2021
Сообщения
46
Реакции
214
Пожалуйста, обратите внимание, что пользователь заблокирован
I don't like intelx, they keep trying to scrape my website so I scraped theirs (Context: )

This data was collected by intelx, they scrape pastebin and other *bin sites for their database search website. This isn't every single paste that's on intelx, but it's most of the ones that contain any type of info. (Email addresses, domains etc)


87,813 Files
Around 6.51 GB Unzipped





Also a quick note: the "paste_info.json" which includes information about each file has slightly more lines than the amount of files, this is because I removed a few duplicate files directly.
Enjoy <3
 
Пожалуйста, обратите внимание, что пользователь заблокирован
My antivirus program is crazy about these files.
This might be because of the large amount of files. This can also be because some people upload web-shells and the anti-virus is detecting them.

Don't worry, none of these files can harm you as they're all text files. Just antiviruses being dumb. If you check any of the files yourself you'll see they're most likely PHP Shells.

If you're really really worried, please tell me an example file of one that was marked and I'll look into it for you
 
Последнее редактирование:
Пожалуйста, обратите внимание, что пользователь заблокирован
Also quickly, more information for anyone interested + a reply to intelx's claims:
They're trying to downplay this by saying this is "Only 0.001% pastes in their index" lol
How many of these pastes are nothing but code, with no actual selectors to search for? As your main source of pastes you scraped from is pastebin, this website is for posting code and not stuff like domains//emails. This is a collection (most) of the pastes that included either email addresses or domains of interest. It's impossible to get every single paste from intelx because there is literally 0 way to search for these pastes that only contain code in them (Besides if you have the exact pastebin URL)

This data was scraped by searching common email domains and then exporting the results. I believe this would've gotten most of the data that is of interest to people on here lol.
Here is a total count of unique emails found in the archive: 46,176,519
Top email providers used:
Код:
9.228.353 @gmail.com
6.126.320 @hotmail.com
5.062.211 @yahoo.com
1.093.255 @aol.com
789.488   @mail.ru
497.350   @hotmail.co.uk
472.061   @web.de
460.114   @yahoo.co.uk
432.062   @orange.fr
393.423   @hotmail.fr
 
I don't like intelx, they keep trying to scrape my website so I scraped theirs (Context: )

This data was collected by intelx, they scrape pastebin and other *bin sites for their database search website. This isn't every single paste that's on intelx, but it's most of the ones that contain any type of info. (Email addresses, domains etc)


87,813 Files
Around 6.51 GB Unzipped


Hidden content


Also a quick note: the "paste_info.json" which includes information about each file has slightly more lines than the amount of files, this is because I removed a few duplicate files directly.
Enjoy <3
wow seriously?
 


Напишите ответ...
  • Вставить:
Прикрепить файлы
Верх