Skip to content

Instantly share code, notes, and snippets.

@AhmedAlhallag
Created March 1, 2023 07:19
Show Gist options
  • Save AhmedAlhallag/a086100f2ada02957b06438e260cd4a3 to your computer and use it in GitHub Desktop.
Save AhmedAlhallag/a086100f2ada02957b06438e260cd4a3 to your computer and use it in GitHub Desktop.
Desktop/Desktio_2023_Post_S1/DB_Lab1/Yelp Project Scraping_Final.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# KH4005CEM (Follow Up Session) - Project: Crawling/Scraping Yelp for 'Dentists'"
},
{
"metadata": {},
"id": "8fc85992",
"cell_type": "markdown",
"source": "## Setup\n"
},
{
"metadata": {
"trusted": true
},
"id": "1b13f0ec",
"cell_type": "code",
"source": "from selenium import webdriver\nfrom selenium.webdriver.common.keys import Keys\nfrom selenium.webdriver.chrome.options import Options\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.chrome import service\n\n\nimport time",
"execution_count": 2,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"id": "59578a6b",
"cell_type": "code",
"source": "# link to scrape \nw = \"https://www.yelp.com/search?find_desc=dentists&find_loc=San+Diego%2C+CA%2C+United+States\"",
"execution_count": 3,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"id": "55b00e8e",
"cell_type": "code",
"source": "# this step assumes you installed the needed wedriver, chrome browser and selenium python library \n# (check installation manual in lab 1.1)\n\n# initialize the service \n# [For Windows]\nwebdriver_service = service.Service(\"C:\\webdrivers\\chromedriver.exe\")\n\n# [For MacOs]\n# webdriver_service = service.Service(\"/usr/local/bin/chromedriver.exe\") \n\n# Optional: To run in headless mode (headless mode: runs behind the scene without opening a new tab every time on desktop):\n# chrome_options = Options()\n# chrome_options.add_argument(\"--headless\")",
"execution_count": 4,
"outputs": []
},
{
"metadata": {},
"id": "b354cea3",
"cell_type": "markdown",
"source": "## Graping the columns for our dentist entity\n- name\n- rating\n- location\n- speciality \n- review count \n- highlights (Do it at Home)\n"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "names,locations, ratings, reviews, specialities = [], [], [], [], []\n",
"execution_count": 41,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": " def to_list(objs, alist, attr=False):\n if uni:\n for obj in objs:\n alist.append(obj.get_attribute(attr))\n else:\n for obj in objs:\n alist.append(obj.text)\n \n\n ",
"execution_count": 40,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# [pagination] naive solution: Scraping all result pages with time.sleep \n\nb = webdriver.Chrome(service=webdriver_service)\n\n# Optional: attach the chrome_options object to your browser clone (via the 'options' keyword argument) \n# at initialization if you want to run in headless mode:\n# b = webdriver.Chrome(service=webdriver_service, options=chrome_options)\n\nb.get(w)\nb.maximize_window()\n\n\nwhile (1):\n time.sleep(2)\n \n\n names_objects = b.find_elements(\"xpath\", '//div[@class=\" businessName__09f24__EYSZE display--inline-block__09f24__fEDiJ border-color--default__09f24__NPAKY\"]//a')\n\n ratings_objects = b.find_elements('xpath', '//div[@class=\" five-stars__09f24__mBKym five-stars--regular__09f24__DgBNj display--inline-block__09f24__fEDiJ border-color--default__09f24__NPAKY\"]')\n\n reviews_count_objects = b.find_elements(\"xpath\", '//div[@class=\" display--inline-block__09f24__fEDiJ border-color--default__09f24__NPAKY\"]/div/div/div[2]/span[2]')\n\n locations_objects = b.find_elements(\"xpath\",'//p[@class=\"css-dzq7l1\"]/span[2]')\n\n specialities_objects = b.find_elements('xpath', '//p[@class=\"css-dzq7l1\"]/span[1]')\n \n to_list(names_objects, names)\n to_list(ratings_objects,ratings, attr=\"aria-label\")\n to_list(reviews_count_objects, reviews)\n to_list(specialities_objects, specialities )\n to_list(locations_objects, locations )\n \n time.sleep(2)\n \n \n next_btn = b.find_element('xpath','//a[@class=\"next-link navigation-button__09f24__m9qRz css-144i0wq\"]')\n next_btn.click()\n\n\n\n\n",
"execution_count": 87,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# important imports For the second (complete) solution \nfrom selenium.common.exceptions import StaleElementReferenceException\nfrom selenium.webdriver.support.ui import WebDriverWait\nfrom selenium.webdriver.support import expected_conditions\n",
"execution_count": 45,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# [pagination] More efficient solution with (asynchronous) stale-element exception handling (for every locator) in all result pages\n\n\nb = webdriver.Chrome(service=webdriver_service)\nb.get(w)\nb.maximize_window()\nignored_exceptions=(StaleElementReferenceException)\n\n\nwhile (1):\n# time.sleep(2)\n \n\n names_objects_xpath = '//div[@class=\" businessName__09f24__EYSZE display--inline-block__09f24__fEDiJ border-color--default__09f24__NPAKY\"]//a'\n names_objects = WebDriverWait(b, 5,ignored_exceptions=ignored_exceptions).until(expected_conditions.presence_of_all_elements_located( (By.XPATH, names_objects_xpath ) ))\n\n ratings_objects_xpath = '//div[@class=\" five-stars__09f24__mBKym five-stars--regular__09f24__DgBNj display--inline-block__09f24__fEDiJ border-color--default__09f24__NPAKY\"]'\n ratings_objects = WebDriverWait(b, 5,ignored_exceptions=ignored_exceptions).until(expected_conditions.presence_of_all_elements_located( (By.XPATH, ratings_objects_xpath ) ))\n\n\n reviews_count_xpath = '//div[@class=\" display--inline-block__09f24__fEDiJ border-color--default__09f24__NPAKY\"]/div/div/div[2]/span[2]'\n reviews_count_objects = WebDriverWait(b, 5,ignored_exceptions=ignored_exceptions).until(expected_conditions.presence_of_all_elements_located( (By.XPATH, reviews_count_xpath ) ))\n\n\n locations_objects_xapth = '//p[@class=\"css-dzq7l1\"]/span[2]'\n locations_objects = WebDriverWait(b, 5,ignored_exceptions=ignored_exceptions).until(expected_conditions.presence_of_all_elements_located( (By.XPATH, locations_objects_xapth ) ))\n\n\n specialities_objects_xpath = '//p[@class=\"css-dzq7l1\"]/span[1]'\n specialities_objects = WebDriverWait(b, 5,ignored_exceptions=ignored_exceptions).until(expected_conditions.presence_of_all_elements_located( (By.XPATH, specialities_objects_xpath ) ))\n \n \n to_list(names_objects, names)\n to_list(ratings_objects,ratings, attr=\"aria-label\")\n to_list(reviews_count_objects, reviews)\n to_list(specialities_objects, specialities )\n to_list(locations_objects, locations )\n \n# time.sleep(2)\n \n \n\n try:\n nextbtn_xpath = '//a[@class=\"next-link navigation-button__09f24__m9qRz css-144i0wq\"]'\n next_btn = WebDriverWait(b, 5,ignored_exceptions=ignored_exceptions).until(expected_conditions.presence_of_element_located((By.XPATH, nextbtn_xpath))).click()\n except:\n print(\"Reached the end\")\n break\n\n\n\n\n",
"execution_count": 84,
"outputs": [
{
"output_type": "stream",
"text": "Reached the end\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# print(names)\n# print(len(names))\n# print(ratings)\n# print(len(ratings))\n# print(reviews)\n# print(len(reviews))\n# print(specialities)\n# print(len(specialities))\n# print(locations)\n# print(len(locations))",
"execution_count": 109,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "# Data Cleaning "
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Aggregate lists (columns) into a dict"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "d = {\n \"Name\": names[:301],\n \"Rating\": ratings[:301],\n \"Reviews Count\": reviews[:301],\n \"Speciality\": specialities[:301],\n \"Location\": locations[:301]\n \n}",
"execution_count": 99,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Convert dict to DataFrame (pythonic table)"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import pandas as pd ",
"execution_count": 90,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df = pd.DataFrame.from_dict(d)",
"execution_count": 101,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df",
"execution_count": 102,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 102,
"data": {
"text/plain": " Name Rating \\\n0 True Design Dentistry - San Diego 5 star rating \n1 K Pat Brown, DDS 5 star rating \n2 Dove Canyon Oral & Maxillofacial Surgery & Den... 5 star rating \n3 Coast Dental 4.5 star rating \n4 Soft Touch Dental: Dr. Ali Fakhimi, DMD 4 star rating \n.. ... ... \n296 Scripps Rock Dental 4.5 star rating \n297 North Park Modern Dentistry 4 star rating \n298 Blue Mountain Dentistry - Morteza Khatibzadeh,... 4.5 star rating \n299 Douglas M Grosmark, DMD 5 star rating \n300 Pravina Reddy, DMD 5 star rating \n\n Reviews Count Speciality \\\n0 109 General DentistryOrthodontistsCosmetic Dentists \n1 169 General DentistryCosmetic Dentists \n2 183 ProsthodontistsOral SurgeonsCosmetic Dentists \n3 111 Cosmetic DentistsOrthodontistsGeneral Dentistry \n4 254 General DentistryCosmetic DentistsOrthodontists \n.. ... ... \n296 136 General DentistryCosmetic DentistsEndodontists \n297 2 Oral SurgeonsGeneral DentistryCosmetic Dentists \n298 123 General DentistryPeriodontistsCosmetic Dentists \n299 103 General Dentistry \n300 14 Cosmetic DentistsGeneral DentistryEndodontists \n\n Location \n0 Clairemont \n1 \n2 \n3 \n4 Clairemont \n.. ... \n296 Scripps Ranch \n297 University Heights \n298 Grantville \n299 \n300 Sorrento Valley \n\n[301 rows x 5 columns]",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Name</th>\n <th>Rating</th>\n <th>Reviews Count</th>\n <th>Speciality</th>\n <th>Location</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>True Design Dentistry - San Diego</td>\n <td>5 star rating</td>\n <td>109</td>\n <td>General DentistryOrthodontistsCosmetic Dentists</td>\n <td>Clairemont</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K Pat Brown, DDS</td>\n <td>5 star rating</td>\n <td>169</td>\n <td>General DentistryCosmetic Dentists</td>\n <td></td>\n </tr>\n <tr>\n <th>2</th>\n <td>Dove Canyon Oral &amp; Maxillofacial Surgery &amp; Den...</td>\n <td>5 star rating</td>\n <td>183</td>\n <td>ProsthodontistsOral SurgeonsCosmetic Dentists</td>\n <td></td>\n </tr>\n <tr>\n <th>3</th>\n <td>Coast Dental</td>\n <td>4.5 star rating</td>\n <td>111</td>\n <td>Cosmetic DentistsOrthodontistsGeneral Dentistry</td>\n <td></td>\n </tr>\n <tr>\n <th>4</th>\n <td>Soft Touch Dental: Dr. Ali Fakhimi, DMD</td>\n <td>4 star rating</td>\n <td>254</td>\n <td>General DentistryCosmetic DentistsOrthodontists</td>\n <td>Clairemont</td>\n </tr>\n <tr>\n <th>...</th>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n </tr>\n <tr>\n <th>296</th>\n <td>Scripps Rock Dental</td>\n <td>4.5 star rating</td>\n <td>136</td>\n <td>General DentistryCosmetic DentistsEndodontists</td>\n <td>Scripps Ranch</td>\n </tr>\n <tr>\n <th>297</th>\n <td>North Park Modern Dentistry</td>\n <td>4 star rating</td>\n <td>2</td>\n <td>Oral SurgeonsGeneral DentistryCosmetic Dentists</td>\n <td>University Heights</td>\n </tr>\n <tr>\n <th>298</th>\n <td>Blue Mountain Dentistry - Morteza Khatibzadeh,...</td>\n <td>4.5 star rating</td>\n <td>123</td>\n <td>General DentistryPeriodontistsCosmetic Dentists</td>\n <td>Grantville</td>\n </tr>\n <tr>\n <th>299</th>\n <td>Douglas M Grosmark, DMD</td>\n <td>5 star rating</td>\n <td>103</td>\n <td>General Dentistry</td>\n <td></td>\n </tr>\n <tr>\n <th>300</th>\n <td>Pravina Reddy, DMD</td>\n <td>5 star rating</td>\n <td>14</td>\n <td>Cosmetic DentistsGeneral DentistryEndodontists</td>\n <td>Sorrento Valley</td>\n </tr>\n </tbody>\n</table>\n<p>301 rows × 5 columns</p>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### Cleaning (Processing) to be done: "
},
{
"metadata": {},
"cell_type": "markdown",
"source": "- [Self-Paced] in \"Speciality\", we need to write an algorithm to seperate specialities with a delimeiter (or maybe columns?). Think about it.\n- [Self-Paced] Most Certainly we have fetched \"Sponsered Results\" multiple times since it exists in every page (same sponsered results in every single result page). Write an algorithm or some processing logic to remove redundant rows \n- Remove \"star rating\" from every row in \"Rating\"\n- (Handling missing values: using defaults) Handle missing locations in every row (set default to \"San Diego\")"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# remove star rating from Rating\ndf['Rating'] = df[\"Rating\"].apply(lambda x: x.replace(\" star rating\",\"\")) ",
"execution_count": 103,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df",
"execution_count": 104,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 104,
"data": {
"text/plain": " Name Rating Reviews Count \\\n0 True Design Dentistry - San Diego 5 109 \n1 K Pat Brown, DDS 5 169 \n2 Dove Canyon Oral & Maxillofacial Surgery & Den... 5 183 \n3 Coast Dental 4.5 111 \n4 Soft Touch Dental: Dr. Ali Fakhimi, DMD 4 254 \n.. ... ... ... \n296 Scripps Rock Dental 4.5 136 \n297 North Park Modern Dentistry 4 2 \n298 Blue Mountain Dentistry - Morteza Khatibzadeh,... 4.5 123 \n299 Douglas M Grosmark, DMD 5 103 \n300 Pravina Reddy, DMD 5 14 \n\n Speciality Location \n0 General DentistryOrthodontistsCosmetic Dentists Clairemont \n1 General DentistryCosmetic Dentists \n2 ProsthodontistsOral SurgeonsCosmetic Dentists \n3 Cosmetic DentistsOrthodontistsGeneral Dentistry \n4 General DentistryCosmetic DentistsOrthodontists Clairemont \n.. ... ... \n296 General DentistryCosmetic DentistsEndodontists Scripps Ranch \n297 Oral SurgeonsGeneral DentistryCosmetic Dentists University Heights \n298 General DentistryPeriodontistsCosmetic Dentists Grantville \n299 General Dentistry \n300 Cosmetic DentistsGeneral DentistryEndodontists Sorrento Valley \n\n[301 rows x 5 columns]",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Name</th>\n <th>Rating</th>\n <th>Reviews Count</th>\n <th>Speciality</th>\n <th>Location</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>True Design Dentistry - San Diego</td>\n <td>5</td>\n <td>109</td>\n <td>General DentistryOrthodontistsCosmetic Dentists</td>\n <td>Clairemont</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K Pat Brown, DDS</td>\n <td>5</td>\n <td>169</td>\n <td>General DentistryCosmetic Dentists</td>\n <td></td>\n </tr>\n <tr>\n <th>2</th>\n <td>Dove Canyon Oral &amp; Maxillofacial Surgery &amp; Den...</td>\n <td>5</td>\n <td>183</td>\n <td>ProsthodontistsOral SurgeonsCosmetic Dentists</td>\n <td></td>\n </tr>\n <tr>\n <th>3</th>\n <td>Coast Dental</td>\n <td>4.5</td>\n <td>111</td>\n <td>Cosmetic DentistsOrthodontistsGeneral Dentistry</td>\n <td></td>\n </tr>\n <tr>\n <th>4</th>\n <td>Soft Touch Dental: Dr. Ali Fakhimi, DMD</td>\n <td>4</td>\n <td>254</td>\n <td>General DentistryCosmetic DentistsOrthodontists</td>\n <td>Clairemont</td>\n </tr>\n <tr>\n <th>...</th>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n </tr>\n <tr>\n <th>296</th>\n <td>Scripps Rock Dental</td>\n <td>4.5</td>\n <td>136</td>\n <td>General DentistryCosmetic DentistsEndodontists</td>\n <td>Scripps Ranch</td>\n </tr>\n <tr>\n <th>297</th>\n <td>North Park Modern Dentistry</td>\n <td>4</td>\n <td>2</td>\n <td>Oral SurgeonsGeneral DentistryCosmetic Dentists</td>\n <td>University Heights</td>\n </tr>\n <tr>\n <th>298</th>\n <td>Blue Mountain Dentistry - Morteza Khatibzadeh,...</td>\n <td>4.5</td>\n <td>123</td>\n <td>General DentistryPeriodontistsCosmetic Dentists</td>\n <td>Grantville</td>\n </tr>\n <tr>\n <th>299</th>\n <td>Douglas M Grosmark, DMD</td>\n <td>5</td>\n <td>103</td>\n <td>General Dentistry</td>\n <td></td>\n </tr>\n <tr>\n <th>300</th>\n <td>Pravina Reddy, DMD</td>\n <td>5</td>\n <td>14</td>\n <td>Cosmetic DentistsGeneral DentistryEndodontists</td>\n <td>Sorrento Valley</td>\n </tr>\n </tbody>\n</table>\n<p>301 rows × 5 columns</p>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# set defaults for location\ndef replace(row):\n if not row:\n return \"San Diego\"\n else: \n return row",
"execution_count": 105,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df[\"Location\"] = df[\"Location\"].apply(replace)",
"execution_count": 106,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df",
"execution_count": 107,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 107,
"data": {
"text/plain": " Name Rating Reviews Count \\\n0 True Design Dentistry - San Diego 5 109 \n1 K Pat Brown, DDS 5 169 \n2 Dove Canyon Oral & Maxillofacial Surgery & Den... 5 183 \n3 Coast Dental 4.5 111 \n4 Soft Touch Dental: Dr. Ali Fakhimi, DMD 4 254 \n.. ... ... ... \n296 Scripps Rock Dental 4.5 136 \n297 North Park Modern Dentistry 4 2 \n298 Blue Mountain Dentistry - Morteza Khatibzadeh,... 4.5 123 \n299 Douglas M Grosmark, DMD 5 103 \n300 Pravina Reddy, DMD 5 14 \n\n Speciality Location \n0 General DentistryOrthodontistsCosmetic Dentists Clairemont \n1 General DentistryCosmetic Dentists San Diego \n2 ProsthodontistsOral SurgeonsCosmetic Dentists San Diego \n3 Cosmetic DentistsOrthodontistsGeneral Dentistry San Diego \n4 General DentistryCosmetic DentistsOrthodontists Clairemont \n.. ... ... \n296 General DentistryCosmetic DentistsEndodontists Scripps Ranch \n297 Oral SurgeonsGeneral DentistryCosmetic Dentists University Heights \n298 General DentistryPeriodontistsCosmetic Dentists Grantville \n299 General Dentistry San Diego \n300 Cosmetic DentistsGeneral DentistryEndodontists Sorrento Valley \n\n[301 rows x 5 columns]",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Name</th>\n <th>Rating</th>\n <th>Reviews Count</th>\n <th>Speciality</th>\n <th>Location</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>True Design Dentistry - San Diego</td>\n <td>5</td>\n <td>109</td>\n <td>General DentistryOrthodontistsCosmetic Dentists</td>\n <td>Clairemont</td>\n </tr>\n <tr>\n <th>1</th>\n <td>K Pat Brown, DDS</td>\n <td>5</td>\n <td>169</td>\n <td>General DentistryCosmetic Dentists</td>\n <td>San Diego</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Dove Canyon Oral &amp; Maxillofacial Surgery &amp; Den...</td>\n <td>5</td>\n <td>183</td>\n <td>ProsthodontistsOral SurgeonsCosmetic Dentists</td>\n <td>San Diego</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Coast Dental</td>\n <td>4.5</td>\n <td>111</td>\n <td>Cosmetic DentistsOrthodontistsGeneral Dentistry</td>\n <td>San Diego</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Soft Touch Dental: Dr. Ali Fakhimi, DMD</td>\n <td>4</td>\n <td>254</td>\n <td>General DentistryCosmetic DentistsOrthodontists</td>\n <td>Clairemont</td>\n </tr>\n <tr>\n <th>...</th>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n </tr>\n <tr>\n <th>296</th>\n <td>Scripps Rock Dental</td>\n <td>4.5</td>\n <td>136</td>\n <td>General DentistryCosmetic DentistsEndodontists</td>\n <td>Scripps Ranch</td>\n </tr>\n <tr>\n <th>297</th>\n <td>North Park Modern Dentistry</td>\n <td>4</td>\n <td>2</td>\n <td>Oral SurgeonsGeneral DentistryCosmetic Dentists</td>\n <td>University Heights</td>\n </tr>\n <tr>\n <th>298</th>\n <td>Blue Mountain Dentistry - Morteza Khatibzadeh,...</td>\n <td>4.5</td>\n <td>123</td>\n <td>General DentistryPeriodontistsCosmetic Dentists</td>\n <td>Grantville</td>\n </tr>\n <tr>\n <th>299</th>\n <td>Douglas M Grosmark, DMD</td>\n <td>5</td>\n <td>103</td>\n <td>General Dentistry</td>\n <td>San Diego</td>\n </tr>\n <tr>\n <th>300</th>\n <td>Pravina Reddy, DMD</td>\n <td>5</td>\n <td>14</td>\n <td>Cosmetic DentistsGeneral DentistryEndodontists</td>\n <td>Sorrento Valley</td>\n </tr>\n </tbody>\n</table>\n<p>301 rows × 5 columns</p>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "### generate an excel sheet"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.to_excel(\"dentists_yelp.xlsx\", index=False)",
"execution_count": 108,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.8.8",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"gist": {
"id": "97db770df9e57628baf1dac635922e59",
"data": {
"description": "Desktop/Desktio_2023_Post_S1/DB_Lab1/Yelp Project Scraping_Final.ipynb",
"public": true
}
},
"_draft": {
"nbviewer_url": "https://gist.github.com/97db770df9e57628baf1dac635922e59"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
@zhaopan
Copy link

zhaopan commented Mar 1, 2023

+10086 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment