Пожалуйста, обратите внимание, что пользователь заблокирован
This Polish National data set consists of Linkedin profiles scraped between 2015 and 2018.
Original format was way-too-many small individual json files that took ages to query.
All listings were therefore first merged into a temporary json file and then processed using JQ
to a single master CSV file that is provided here. This was a rather slow processing task ....
Data fields retained in the CSV file were only those with short enough text-strings and also avoided
duplicate, useless or fields with mostly null values. Final resulting fields left in the CSV were therefore
as follows:
fullname,
email,
twitter,
headline,
completeness, connections, recommendations,
industry, current_date, current_position, current_company,
previous_date, previous_company,
education_date, education_list,
military,
language1, language2,
skills1, skills2, skills3,
longitude, latitude,
country, country_code,
canonical,
scrapeId,
picture,
url,
id
Please note:
- The data was scrapped from publicly accessible Linkedin profile information.
- Only roughly 10% of the profiles have email address.
- Data is missing gender information, no phone numbers and no passwords.
- The Longitude / Latitude coordinates are very generalized and pretty much useless for mapping purposes.
- Most profile URL's still work however Picture URL links no longer work correctly
and are only included to indicate a profile picture was present,
If there's some special keyword search term that you really need that is not showing
up in the CSV file (but should normally be present in standard Linkedin profiles)
then please let me know and I can try to do a search of the original JSON files
to and recover it.
Original format was way-too-many small individual json files that took ages to query.
All listings were therefore first merged into a temporary json file and then processed using JQ
to a single master CSV file that is provided here. This was a rather slow processing task ....
Data fields retained in the CSV file were only those with short enough text-strings and also avoided
duplicate, useless or fields with mostly null values. Final resulting fields left in the CSV were therefore
as follows:
fullname,
email,
twitter,
headline,
completeness, connections, recommendations,
industry, current_date, current_position, current_company,
previous_date, previous_company,
education_date, education_list,
military,
language1, language2,
skills1, skills2, skills3,
longitude, latitude,
country, country_code,
canonical,
scrapeId,
picture,
url,
id
Please note:
- The data was scrapped from publicly accessible Linkedin profile information.
- Only roughly 10% of the profiles have email address.
- Data is missing gender information, no phone numbers and no passwords.
- The Longitude / Latitude coordinates are very generalized and pretty much useless for mapping purposes.
- Most profile URL's still work however Picture URL links no longer work correctly
and are only included to indicate a profile picture was present,
If there's some special keyword search term that you really need that is not showing
up in the CSV file (but should normally be present in standard Linkedin profiles)
then please let me know and I can try to do a search of the original JSON files
to and recover it.