Skip to content

Instantly share code, notes, and snippets.

@mjgoldman16
mjgoldman16 / Regular Expressions to Find Education and Skills in Descriptions
Created February 28, 2018 19:10
Regular Expressions to Find Education and Skills in Descriptions
## GATHERING INFORMATION BASED ON EDUATION LEVEL
bachelors = [re.findall("(?<![A-Z])B\.?S\.?c?(?![A-Z])|(?<![A-Z])B\.?A\.?(?![A-Z])|BACHELOR|UNDERGRAD.{0,40} DEGREE|ASSOCIATE'?S?.{20}DEGREE",i, re.IGNORECASE) for i in desc_col.values]
mba = [re.findall("([\s|-|/]MBA[\s|-|/]|[\s|-|/]MBUS[\s|-|/]|[\s|-|/]MBS[\s|-|/]|MASTERS? OF BUSINESS)",i,re.IGNORECASE) for i in desc_col.values]
masters = [re.findall("(MASTER'?S?.{0,40}DEGREE|GRADUATE.{0,40}DEGREE|(?<![A-Z])M\.?S\.?(?![A-Z]|\sDYNAMICS|,\sDSC)(?!-?~?\s?OFFICE|\sEXCEL|\sWORD|\sACCESS|-?\s?SQL)|ADVANCED?.{0,40}DEGREE)",i,re.IGNORECASE) for i in desc_col.values]
phd = [re.findall("(PH\.?D|ADVANCED?.{0,40}DEGREE|DOCTORA[TE|L]|POST-?\s?GRADUATE)",i,re.IGNORECASE) for i in desc_col.values]
## GATHERING INFORMATION BASED ON SKILLS
python = [re.findall("PYTHON",i,re.IGNORECASE) for i in desc_col.values]
R = [re.findall("[\s,\.\-(\[\\\]R[\s,\.\-)\]\\\]",i,re.IGNORECASE) for i in desc_col.values]
SQL = [re.findall("SQL",i,re.IGNORECASE) for i in desc_col.values]
@mjgoldman16
mjgoldman16 / Generation of Glassdoor URL
Created February 28, 2018 19:08
Generation of Glassdoor URL
start_urls = [("https://www.glassdoor.com/Job/" +
str(city[0]) + # The first element of the city name
"-data-scientist-jobs-SRCH_IL.0," + # The job search term
str(len(city[0])) + # The last element of the city name
"_IC" +
str(city[1]) + # The unique city ID
"_KO" +
str(len(city[0]) + 1) + # The first element of the job search term
"," +
str(len(city[0]) + 15) + # The last element of the job search term (data-scientist is 15 chr long)