Skip to content

Instantly share code, notes, and snippets.

@sithart
Last active March 3, 2021 06:45
Show Gist options
  • Save sithart/b7818bd421a52980593307825f5db18b to your computer and use it in GitHub Desktop.
Save sithart/b7818bd421a52980593307825f5db18b to your computer and use it in GitHub Desktop.
Bioinformatics keywords scraping from wikipedia
import requests
from bs4 import BeautifulSoup
import json
import csv
import pandas as pd
res = requests.get("https://mipt.ru/dbmp/student/files/bioinformatics/books/glossary_bioinf.php")
soup = BeautifulSoup(res.text, "html.parser")
keys = []
values =[]
for data in soup.find_all("span", attrs ={"class":"entry"}):
key = data.get_text()
val = data.parent.parent
keys.append(key)
values.append(val.get_text().replace(key, '').replace('\n', ''))
print(len(values))
print(len(keys))
dictionary = dict(zip(keys, values))
with open('Glossary_of_info.txt', 'w') as json_file:
json.dump(dictionary, json_file, indent=4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment