Skip to content

Instantly share code, notes, and snippets.

@JohnDeJesus22
Last active August 21, 2019 19:50
Show Gist options
  • Save JohnDeJesus22/c2e6a432de727abc1803c2eece1453a7 to your computer and use it in GitHub Desktop.
Save JohnDeJesus22/c2e6a432de727abc1803c2eece1453a7 to your computer and use it in GitHub Desktop.
JssuniScrap
# Initialize Data Frame
df = pd.DataFrame()
# Names
df['Name'] = [result.find('h2').text for result in results]
# Designation
df['Designation'] = [result.find('p').contents[1].strip(' ') for result in results]
# Email
df['Email'] = [result.find('p').contents[4].strip(' ') for result in results]
# Qualifications
df['Qualifications'] = [result.find('p').contents[7].strip(' ') for result in results]
# Create function to get number of publications written
def get_no_publications(result):
try:
publications = int(result.find('p').contents[10].strip(' '))
except:
publications = 0
return publications
# Number of Publications
df['NumberOfPublications'] = [get_no_publications(result) for result in results]
# Export to csv without indices
df.to_csv('JssunPharmaceuticsDepartmentInfo.csv', index=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment