Created
February 28, 2018 19:10
-
-
Save mjgoldman16/9c9a0d88eacc3b0ee86a99e3b3e1f354 to your computer and use it in GitHub Desktop.
Regular Expressions to Find Education and Skills in Descriptions
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## GATHERING INFORMATION BASED ON EDUATION LEVEL | |
bachelors = [re.findall("(?<![A-Z])B\.?S\.?c?(?![A-Z])|(?<![A-Z])B\.?A\.?(?![A-Z])|BACHELOR|UNDERGRAD.{0,40} DEGREE|ASSOCIATE'?S?.{20}DEGREE",i, re.IGNORECASE) for i in desc_col.values] | |
mba = [re.findall("([\s|-|/]MBA[\s|-|/]|[\s|-|/]MBUS[\s|-|/]|[\s|-|/]MBS[\s|-|/]|MASTERS? OF BUSINESS)",i,re.IGNORECASE) for i in desc_col.values] | |
masters = [re.findall("(MASTER'?S?.{0,40}DEGREE|GRADUATE.{0,40}DEGREE|(?<![A-Z])M\.?S\.?(?![A-Z]|\sDYNAMICS|,\sDSC)(?!-?~?\s?OFFICE|\sEXCEL|\sWORD|\sACCESS|-?\s?SQL)|ADVANCED?.{0,40}DEGREE)",i,re.IGNORECASE) for i in desc_col.values] | |
phd = [re.findall("(PH\.?D|ADVANCED?.{0,40}DEGREE|DOCTORA[TE|L]|POST-?\s?GRADUATE)",i,re.IGNORECASE) for i in desc_col.values] | |
## GATHERING INFORMATION BASED ON SKILLS | |
python = [re.findall("PYTHON",i,re.IGNORECASE) for i in desc_col.values] | |
R = [re.findall("[\s,\.\-(\[\\\]R[\s,\.\-)\]\\\]",i,re.IGNORECASE) for i in desc_col.values] | |
SQL = [re.findall("SQL",i,re.IGNORECASE) for i in desc_col.values] | |
java = [re.findall("JAVA(?!SCRIPT)",i,re.IGNORECASE) for i in desc_col.values] | |
C = [re.findall("[\s,\.\-(\\\]C([\s,\.\-)\]\\\]|\+\+|SHARP)",i,re.IGNORECASE) for i in desc_col.values] | |
hadoop = [re.findall("HADOOP",i,re.IGNORECASE) for i in desc_col.values] | |
spark = [re.findall("SPARK",i,re.IGNORECASE) for i in desc_col.values] | |
excel = [re.findall("Excel[\s,\.\-)\]\\\)]", i) for i in desc_col.values] | |
sas = [re.findall("SAS", i) for i in desc_col.values] | |
stata = [re.findall("STATA", i, re.IGNORECASE) for i in desc_col.values] | |
matlab = [re.findall("MATLAB", i, re.IGNORECASE) for i in desc_col.values] | |
scala = [re.findall("SCALA(?![A-Z])", i, re.IGNORECASE) for i in desc_col.values] | |
vba = [re.findall("VBA", i, re.IGNORECASE) for i in desc_col.values] | |
tableau = [re.findall("TABLEAU", i, re.IGNORECASE) for i in desc_col.values] | |
h2o = [re.findall("H2[O|0]", i, re.IGNORECASE) for i in desc_col.values] | |
ruby = [re.findall("RUBY", i, re.IGNORECASE) for i in desc_col.values] | |
html = [re.findall("HTML", i, re.IGNORECASE) for i in desc_col.values] | |
css = [re.findall("CSS", i, re.IGNORECASE) for i in desc_col.values] | |
javascript = [re.findall("JAVA-?\s?SCRIPT", i, re.IGNORECASE) for i in desc_col.values] | |
hive = [re.findall("(.{20})(?<!ARC)(HIVE)(.{20})",i,re.IGNORECASE) for i in desc_col.values] | |
webscrape = [re.findall("SCRAPY|SELENIUM|SCRAPE|SCRAPING|WEB SCRAP",i,re.IGNORECASE) for i in desc_col.values] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am a college student and just love playing video games. I once played for twelve hours in a row and I would like to have more free time for it. But unfortunately lately we have been asked quite a lot of different written works, and the last one was on the topic of volunteering. How glad I am that I found the https://gradesfixer.com/free-essay-examples/volunteering/ platform with many free essay examples on a variety of topics, thanks to which it was much easier to cope with homework.