Skip to content

Instantly share code, notes, and snippets.

@mjgoldman16
Created February 28, 2018 19:08
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mjgoldman16/46b1abbe58f613af82630df4cd3df0dc to your computer and use it in GitHub Desktop.
Save mjgoldman16/46b1abbe58f613af82630df4cd3df0dc to your computer and use it in GitHub Desktop.
Generation of Glassdoor URL
start_urls = [("https://www.glassdoor.com/Job/" +
str(city[0]) + # The first element of the city name
"-data-scientist-jobs-SRCH_IL.0," + # The job search term
str(len(city[0])) + # The last element of the city name
"_IC" +
str(city[1]) + # The unique city ID
"_KO" +
str(len(city[0]) + 1) + # The first element of the job search term
"," +
str(len(city[0]) + 15) + # The last element of the job search term (data-scientist is 15 chr long)
"_IP" +
str(i) + # The page number we want to look at
".htm") for city in cities for i in range(1, 31)] # Limited to 30 pages since there is no 31st page
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment