Created
June 3, 2020 18:28
-
-
Save francoisstamant/8b392c33ff393d845bee9c926cb9ac4a to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data=[] | |
for i in range(0,10): | |
url = final_list[i] | |
driver2 = webdriver.Chrome() | |
driver2.get(url) | |
sleep(randint(10,20)) | |
soup = BeautifulSoup(driver2.page_source, 'html.parser') | |
my_table2 = soup.find_all(class_=['title-2', 'rating-score body-3']) | |
review=soup.find_all(class_='reviews')[-1] | |
try: | |
price=soup.find_all('span', attrs={'class':'price'})[-1] | |
except: | |
price=soup.find_all('span', attrs={'class':'price'}) | |
for tag in my_table2: | |
data.append(tag.text.strip()) | |
for p in price: | |
data.append(p) | |
for r in review: | |
data.append(r) |
Data is an empty list. Inside the for loop, you will be appending the data (lines 19,22 and 25).
All the gists are run separetely, one after the other, since they just showcase different ways of scraping the data. Go here for some description of the instructions, in case you need it: https://towardsdatascience.com/scraping-multiple-urls-with-python-tutorial-2b74432d085f
I hope this helped!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
hello, i want to use your code, but im very new in python.. i have answer, the first line is "data[]" so where directory that line would get the data?
and then, i see your 4 code scraping_hostels_1, scraping_hostels_2, scraping_hostels_3, and scraping_hostels_4.. those are run in one section or run in 4 times run? could you give me the more detail to exexute that? thx