This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
%%time | |
data['Number_of_divisor'] = data.Number.apply(countDivisors) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
%%time | |
pool = mp.Pool(processes = (mp.cpu_count() - 1)) | |
answer = pool.map(countDivisors,random_data) | |
pool.close() | |
pool.join() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Web Scraping - Beautiful Soup | |
""" | |
# importing required libraries | |
import requests | |
from bs4 import BeautifulSoup | |
import pandas as pd | |
# target URL to scrap |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# find all the sections with specifiedd class name | |
cards_data = data.find_all('div', attrs={'class', 'width100 fl htlListSeo hotel-tile-srp-container hotel-tile-srp-container-template new-htl-design-tile-main-block'}) | |
# total number of cards | |
print('Total Number of Cards Found : ', len(cards_data)) | |
# source code of hotel cards | |
for card in cards_data: | |
print(card) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# extract the hotel name and price per room | |
for card in cards_data: | |
# get the hotel name | |
hotel_name = card.find('p') | |
# get the room price | |
room_price = card.find('li', attrs={'class': 'htl-tile-discount-prc'}) | |
print(hotel_name.text, room_price.text) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# create a list to store the data | |
scraped_data = [] | |
for card in cards_data: | |
# initialize the dictionary | |
card_details = {} | |
# get the hotel name | |
hotel_name = card.find('p') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Web Scraping - Scrap Images | |
""" | |
# importing required libraries | |
import requests | |
from bs4 import BeautifulSoup | |
# target URL | |
url = "https://www.goibibo.com/hotels/hotels-in-shimla-ct/" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# select src tag | |
image_src = [x['src'] for x in images] | |
# select only jp format images | |
image_src = [x for x in image_src if x.endswith('.jpg')] | |
for image in image_src: | |
print(image) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
image_count = 1 | |
for image in image_src: | |
with open('image_'+str(image_count)+'.jpg', 'wb') as f: | |
res = requests.get(image) | |
f.write(res.content) | |
image_count = image_count+1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# create a sample list | |
my_list = [i for i in range(1,10000000)] | |
# parallelize the data | |
rdd_0 = sc.parallelize(my_list,3) | |
rdd_0 |