Last active
September 7, 2020 01:00
-
-
Save josht-jpg/d2d2212d427ab1a05578183727e7ff5b to your computer and use it in GitHub Desktop.
Filtering out Dostoevsky titles
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Goodreads has over 50 pages of titles of Dostoevsky books. | |
#All relevant titles are on the first two pages. | |
page1_url = "https://www.goodreads.com/author/list/3137322.Fyodor_Dostoyevsky?page=1&per_page=30" | |
page2_url = "https://www.goodreads.com/author/list/3137322.Fyodor_Dostoyevsky?page=2&per_page=30" | |
page1 = request.urlopen(page1_url) | |
page2 = request.urlopen(page2_url) | |
page1_soup = BeautifulSoup(page1, "html.parser") | |
page2_soup = BeautifulSoup(page2, "html.parser") | |
dostoyevsky_titles = pd.Series(page1_soup.find_all(class_ = "bookTitle")) | |
dostoyevsky_titles = dostoyevsky_titles.append(pd.Series(page2_soup.find_all(class_ = "bookTitle"))) | |
dostoyevsky_titles = dostoyevsky_titles.apply(get_contents) | |
titles = titles[~ titles.isin(dostoyevsky_titles)] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment