Skip to content

Instantly share code, notes, and snippets.

@pratik-dani
Last active May 3, 2021 19:32
Show Gist options
  • Save pratik-dani/243ed1b15f2cb5644f4aa279263fd017 to your computer and use it in GitHub Desktop.
Save pratik-dani/243ed1b15f2cb5644f4aa279263fd017 to your computer and use it in GitHub Desktop.
import requests
headers = {"user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36",
}
company_link = 'https://www.linkedin.com/voyager/api/entities/companies/2652230'
with requests.session() as s:
s.cookies['li_at'] = "your li_at cookie"
s.cookies["JSESSIONID"] = "your JSESSIONID"
s.headers = headers
s.headers["csrf-token"] = s.cookies["JSESSIONID"].strip('"')
response = s.get(company_link)
response_dict = response.json()
print(response_dict)
# Output
#{'employeeCountRange': '2-10',
# 'entityUrn': 'urn:li:fs_company:28738388',
# 'websiteUrl': 'http://www.surgestreet.com',
# 'companyType': 'Privately Held',
# 'foundedDate': {'year': 2019},
# 'entityInfo': {'objectUrn': 'urn:li:company:28738388',
# 'trackingId': 'uRi/ruAgS5WC122P11oqyQ=='},
# 'industries': ['Internet'],
# 'description': 'Surge Street is a leading outbound sales and lead generation agency based out of India. We provide companies with outsourced Sales Development Solutions and helps them grow their lead pipeline and drive predictable revenue by outsourcing majority of their sales development. \n\nWe deeply integrate good old prospecting strategies with new age technology and growth tools to deliver qualified meetings at scale. We do everything from developing your true Ideal Client Profile to delivering qualified customers and everything in between. \n\nLooking to develop sales and grow you business?\nDrop us a line to know what we can do for you!\n',
# 'basicCompanyInfo': {'headquarters': 'Gurugram',
# 'followingInfo': {'entityUrn': 'urn:li:fs_followingInfo:urn:li:company:28738388',
# 'following': False,
# 'trackingUrn': 'urn:li:company:28738388'},
# 'miniCompany': {'objectUrn': 'urn:li:company:28738388',
# 'entityUrn': 'urn:li:fs_miniCompany:28738388',
# 'name': 'Surge Street',
# 'showcase': False,
# 'active': True,
# 'logo': {'com.linkedin.common.VectorImage': {'artifacts': [{'width': 200,
# 'fileIdentifyingUrlPathSegment': '200_200/0?e=1599091200&v=beta&t=ciZdxaZB3GF29eYrk-VnSXG3RNDaAMurdczUh1-XGQU',
# 'expiresAt': 1599091200000,
# 'height': 200},
# {'width': 100,
# 'fileIdentifyingUrlPathSegment': '100_100/0?e=1599091200&v=beta&t=v8LJF7bkLpdsxmGNvEKbU4jY-q5IJNkwwD2Z57kbsoE',
# 'expiresAt': 1599091200000,
# 'height': 100},
# {'width': 400,
# 'fileIdentifyingUrlPathSegment': '400_400/0?e=1599091200&v=beta&t=B7DruRcyw50uPNw9pRyksHbq7NbAbZpx4xXL0Scz8g0',
# 'expiresAt': 1599091200000,
# 'height': 400}],
# 'rootUrl': 'https://media-exp1.licdn.com/dms/image/C510BAQH8GaOTu614IA/company-logo_'}},
# 'universalName': 'surge-street',
# 'trackingId': 'wi7G+92cSRC8r6jnDN5Cew=='}}}
@panamantis
Copy link

Very interesting! Worked first time I tried it. Any suggestions on scaling to millions of company pages?

@pratik-dani
Copy link
Author

There is no way LinkedIn will let you scrape millions without any pauses. Have you tried with randomized pauses? Pauses between each request and using rotating IP's will help you.

@jorge3018
Copy link

how can i get the cookies without selenium, only with request library? i just neet the s.cookies['li_at'] token

@pratik-dani
Copy link
Author

pratik-dani commented May 3, 2021

@jorge3018 You can just log in using the requests library with your username and password. It is simple enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment