Created
February 28, 2018 19:08
-
-
Save mjgoldman16/46b1abbe58f613af82630df4cd3df0dc to your computer and use it in GitHub Desktop.
Generation of Glassdoor URL
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
start_urls = [("https://www.glassdoor.com/Job/" + | |
str(city[0]) + # The first element of the city name | |
"-data-scientist-jobs-SRCH_IL.0," + # The job search term | |
str(len(city[0])) + # The last element of the city name | |
"_IC" + | |
str(city[1]) + # The unique city ID | |
"_KO" + | |
str(len(city[0]) + 1) + # The first element of the job search term | |
"," + | |
str(len(city[0]) + 15) + # The last element of the job search term (data-scientist is 15 chr long) | |
"_IP" + | |
str(i) + # The page number we want to look at | |
".htm") for city in cities for i in range(1, 31)] # Limited to 30 pages since there is no 31st page |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment