Skip to content

Instantly share code, notes, and snippets.

@AhmedAlhallag
Created February 21, 2023 21:20
Show Gist options
  • Save AhmedAlhallag/83fd3295d53dc47850c83dce0abe6976 to your computer and use it in GitHub Desktop.
Save AhmedAlhallag/83fd3295d53dc47850c83dce0abe6976 to your computer and use it in GitHub Desktop.
Desktop/Desktio_2023_Post_S1/DB_Lab1/WebScraping_InLab.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# Lab 1.1: Data Collection & Preperation via Web Scraping using Python & Selenium "
},
{
"metadata": {},
"id": "a8ea0491",
"cell_type": "markdown",
"source": "## Part 1: Setup"
},
{
"metadata": {
"trusted": false
},
"id": "73e5c45b",
"cell_type": "code",
"source": "# import libs\nfrom selenium import webdriver\nfrom selenium.webdriver.common.keys import Keys\nfrom selenium.webdriver.chrome.options import Options\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.chrome import service\n\nimport time\n\n",
"execution_count": 35,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "1e69c7af",
"cell_type": "code",
"source": "# relevant website to scrape, stored as a variable\nw = \"https://www.geekbuying.com/search?keyword=laptops\"",
"execution_count": 36,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "4e115668",
"cell_type": "code",
"source": "# initialize web driver \n\n\n# For windows machines \nwebdriver_service = service.Service(\"C:\\webdrivers\\chromedriver.exe\") # service (program) location\n\n# For mac machines\n# webdriver_service = service.Service(\"/usr/local/bin/chromedriver.exe\") # service (program) location\n\n\n",
"execution_count": 37,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "571370f2",
"cell_type": "code",
"source": "# creating an object/Clone to Chrome Browser + linking it to the binaries \nbrowser = webdriver.Chrome(service = webdriver_service ) \nbrowser.get(w)\nbrowser.maximize_window()",
"execution_count": 38,
"outputs": []
},
{
"metadata": {},
"id": "af0a536d",
"cell_type": "markdown",
"source": "## Part 2: Xpath Locator \n"
},
{
"metadata": {
"trusted": false
},
"id": "cc0fa455",
"cell_type": "code",
"source": "# Price Column: Path to ALL prices from all laptop products: //li[@class=\"searchResultItem\"]//div[@class=\"price\"]\n# this will return a \"webdriver\" object, not really what we need\nprice_objects = browser.find_elements(\"xpath\", '//li[@class=\"searchResultItem\"]//div[@class=\"price\"]' )\n",
"execution_count": 39,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "f4784fe6",
"cell_type": "code",
"source": "# Procsessing Logic to extract the current price after discount as a text and store it in a variables\n\nprice_list = []\nfor price in price_objects:\n price_list.append(price.text.split('\\n')[0])",
"execution_count": 40,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "92539d39",
"cell_type": "code",
"source": "price_list",
"execution_count": 41,
"outputs": [
{
"data": {
"text/plain": "['$99.99',\n '$299.99',\n '$319.00',\n '$29.99',\n '$19.99',\n '$349.99',\n '$1199.00',\n '$1142.99',\n '$1059.99',\n '$1259.00',\n '$1259.00',\n '$799.00',\n '$1099.00',\n '$949.00',\n '$1299.00',\n '$1199.00',\n '$73.99',\n '$14.99',\n '$1099.00',\n '$1199.00',\n '$99.99',\n '$1459.00',\n '$1459.00',\n '$1459.00',\n '$1459.00',\n '$1459.00',\n '$1459.00',\n '$1259.00',\n '$1259.00',\n '$1259.00',\n '$1259.00',\n '$1259.00',\n '$1259.00',\n '$1259.00',\n '$1259.00',\n '$1259.00',\n '$1259.00',\n '$1142.99',\n '$484.99',\n '$139.99']"
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"id": "64c6c8cd",
"cell_type": "markdown",
"source": "## Part 3: Class Name Locators"
},
{
"metadata": {
"trusted": false
},
"id": "1226d2a2",
"cell_type": "code",
"source": "# Laptops' Names column:\n# notice here that we are using the \"class\" attribute locator instead of relying on xpath \n# (both would work, its a personal preference)\nnames_objects = browser.find_elements(By.CLASS_NAME, \"name\")",
"execution_count": 42,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "9af7e19b",
"cell_type": "code",
"source": "# Processing Logic to extract the names as text and store it in a list\nnames_list= [ ]\nfor name in names_objects:\n names_list.append(name.text)",
"execution_count": 45,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "8332dfbd",
"cell_type": "code",
"source": "names_list",
"execution_count": 46,
"outputs": [
{
"data": {
"text/plain": "['Game Monitor Second Screen for Laptop PC Phone Xbox',\n 'ALLDOCUBE GTBook 15 Laptop, 15.6 inch FHD IPS 1920x1080, Intel Celeron N5100 Quad Core 2.8GHz, 12GB DDR4 256GB SSD, 2.0MP Camera BT5.0 2.4/5GHz Dual WiFi, MicroSD 3.5mm Audio USB3.0 Mini HDMI Type-C',\n \"BMAX X15 Plus 15.6'' 1080P Laptop Intel Jasper Lake N5095 4 Cores 4 Threads, 12GB DDR4 512GB SSD Windows 11 5G WiFi Grey - EU\",\n 'Type-C Hub 10 in 1 USB C to 4K HDMI+RJ45+PD 100W Charge+USB3.0+VGA+SD/TF card reader Dock for MacBook Windows',\n 'One-Netbook T1 2 in 1 Laptop Protective Bag',\n 'ALLDOCUBE GTBook 15 Laptop, 15.6 inch FHD IPS 1920x1080, Intel Celeron N5100 Quad Core 2.8GHz, 12GB DDR4 512GB SSD, 2.0MP Camera BT5.0 2.4/5GHz Dual WiFi, MicroSD 3.5mm Audio USB3.0 Mini HDMI Type-C',\n \"One Netbook 4S Platinum 2 in 1 Laptop Intel Core i7-1250U Processor, 16GB LPDDR5 1TB ROM 10.1'' 2.5K LTPS Full Display - EU Plug\",\n 'GPD WIN Max 2 Smallest Handheld Gaming Laptop 10.1 Inch Touch Screen CPU AMD 6800U Mini PC RAM 16GB SSD 1TB - EU Plug',\n 'GPD Pocket 3 Laptop Mini Tablet PC 8 Inch 1920 x 1200 Resolution IPS Touchscreen Intel Core i7-1195G7 16GB RAM 1TB SSD Windows 10 Home 38.5Wh Battery - EU Plug',\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 32GB LPDDR5 1TB M.2 17100mAh Battery Windows White - JP\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 32GB LPDDR5 1TB M.2 17100mAh Battery Windows White - EU\",\n \"One Netbook T1 2 in 1 Laptop Intel Pentium Gold Processor 8505 16GB DDR5 512GB ROM 13'' 2K Ultra-IPS Screen Windows 11 - Star Grey\",\n \"One Netbook T1 2 in 1 Laptop Intel Core i5-1240P 16GB DDR5 2TB ROM 13'' 2K Ultra-IPS Screen Windows 11 WiFi 6 - Platinum Grey\",\n \"One Netbook T1 2 in 1 Laptop Intel Core i5-1240P 16GB DDR5 1 TB ROM 13'' 2K Ultra-IPS Screen Windows 11 WiFi 6 - Platinum Grey\",\n \"One Netbook T1 2 in 1 Laptop Intel Core i7-1260P 16GB DDR5 2TB ROM 13'' 2K Ultra-IPS Screen Windows 11 WiFi 6 - Platinum Grey\",\n \"One Netbook T1 2 in 1 Laptop Intel Core i7-1260P 16GB DDR5 1TB ROM 13'' 2K Ultra-IPS Screen Windows 11 WiFi 6 - Platinum Grey\",\n 'Logitech G502 HERO Wired Gaming Mouse 16000DPI With 16.8 millon Backlight For PC / - Black',\n 'P1 Webcam 1080P with Microphone Auto Focus Light Correction For Windows PC Mac Desktop - Black',\n \"ONE Netbook OneXPlayer Mini Pro Game Console 7'' IPS Display AMD RYZEN 7 6800U 16GB LPDDR5 1TB M.2 12450mAh Battery Windows11 - EU\",\n \"One Netbook 4S Platinum 2 in 1 Laptop Intel Core i7-1250U Processor, 16GB LPDDR5 1TB ROM 10.1'' 2.5K LTPS Full Display - US Plug\",\n 'Second Screen for Laptop PC Phone Xbox-US Plug',\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 32GB LPDDR5 2TB M.2 17100mAh Battery Windows White - JP\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 32GB LPDDR5 2TB M.2 17100mAh Battery Windows Black - JP\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 32GB LPDDR5 2TB M.2 17100mAh Battery Windows White - US\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 32GB LPDDR5 2TB M.2 17100mAh Battery Windows Black - US\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 32GB LPDDR5 2TB M.2 17100mAh Battery Windows White - EU\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 32GB LPDDR5 2TB M.2 17100mAh Battery Windows Black - EU\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 32GB LPDDR5 1TB M.2 17100mAh Battery Windows Black - JP\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 32GB LPDDR5 1TB M.2 17100mAh Battery Windows White - US\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 32GB LPDDR5 1TB M.2 17100mAh Battery Windows Black - US\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 32GB LPDDR5 1TB M.2 17100mAh Battery Windows Black - EU\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 16GB LPDDR5 2TB M.2 17100mAh Battery Windows White - JP\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 16GB LPDDR5 2TB M.2 17100mAh Battery Windows Black - JP\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 16GB LPDDR5 2TB M.2 17100mAh Battery Windows White - US\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 16GB LPDDR5 2TB M.2 17100mAh Battery Windows Black - US\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 16GB LPDDR5 2TB M.2 17100mAh Battery Windows White - EU\",\n \"ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K IPS Screen AMD Ryzen 7 6800U 16GB LPDDR5 2TB M.2 17100mAh Battery Windows Black - EU\",\n 'GPD WIN Max 2 Smallest Handheld Gaming Laptop 10.1 Inch Touch Screen CPU AMD 6800U Mini PC RAM 16GB SSD 1TB - US Plug',\n 'GPD MicroPC Pocket Laptop Mini PC 6 Inch Screen Intel Celeron N4120 8GB RAM 256GB SSD Windows 10 Pro Home 2x3100mAh Battery - US Plug',\n 'Kingston A400 SSD 960GB SATA 3 2.5 Inch Solid State Drive Phison S11 Support Windows System 500MB/s Read Speed For Desktop - Dark Gray']"
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"id": "a5175fca",
"cell_type": "markdown",
"source": "### Activity 1: Extract \"ratings\" data and \"favourites\" data with the same exact fashion (using either xpath or attribute locators)"
},
{
"metadata": {},
"id": "bbf8b614",
"cell_type": "markdown",
"source": "## Part 4: Data Transformation & Cleaning"
},
{
"metadata": {
"trusted": false
},
"id": "bd847cd2",
"cell_type": "code",
"source": "import pandas as pd\n# pandas is a python library which contains the closest data structure to \"tables\", which is called \"DataFrames\"",
"execution_count": 47,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "aa8086c1",
"cell_type": "code",
"source": "# convert both extracted lists as columns (within a dictionary)\nd = {\n \n \"Names\": names_list,\n \"Prices\": price_list\n \n}",
"execution_count": 48,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "6971c2d7",
"cell_type": "code",
"source": "# convert Dict into a table (a DataFrame)\ndf = pd.DataFrame.from_dict(d)",
"execution_count": 49,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "63fd4a2a",
"cell_type": "code",
"source": "df\n# notice how the data frame looks exactly as a table! \n# notice also how dataframes comes with an additional index column by default",
"execution_count": 51,
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Names</th>\n <th>Prices</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Game Monitor Second Screen for Laptop PC Phone...</td>\n <td>$99.99</td>\n </tr>\n <tr>\n <th>1</th>\n <td>ALLDOCUBE GTBook 15 Laptop, 15.6 inch FHD IPS ...</td>\n <td>$299.99</td>\n </tr>\n <tr>\n <th>2</th>\n <td>BMAX X15 Plus 15.6'' 1080P Laptop Intel Jasper...</td>\n <td>$319.00</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Type-C Hub 10 in 1 USB C to 4K HDMI+RJ45+PD 10...</td>\n <td>$29.99</td>\n </tr>\n <tr>\n <th>4</th>\n <td>One-Netbook T1 2 in 1 Laptop Protective Bag</td>\n <td>$19.99</td>\n </tr>\n <tr>\n <th>5</th>\n <td>ALLDOCUBE GTBook 15 Laptop, 15.6 inch FHD IPS ...</td>\n <td>$349.99</td>\n </tr>\n <tr>\n <th>6</th>\n <td>One Netbook 4S Platinum 2 in 1 Laptop Intel Co...</td>\n <td>$1199.00</td>\n </tr>\n <tr>\n <th>7</th>\n <td>GPD WIN Max 2 Smallest Handheld Gaming Laptop ...</td>\n <td>$1142.99</td>\n </tr>\n <tr>\n <th>8</th>\n <td>GPD Pocket 3 Laptop Mini Tablet PC 8 Inch 1920...</td>\n <td>$1059.99</td>\n </tr>\n <tr>\n <th>9</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1259.00</td>\n </tr>\n <tr>\n <th>10</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1259.00</td>\n </tr>\n <tr>\n <th>11</th>\n <td>One Netbook T1 2 in 1 Laptop Intel Pentium Gol...</td>\n <td>$799.00</td>\n </tr>\n <tr>\n <th>12</th>\n <td>One Netbook T1 2 in 1 Laptop Intel Core i5-124...</td>\n <td>$1099.00</td>\n </tr>\n <tr>\n <th>13</th>\n <td>One Netbook T1 2 in 1 Laptop Intel Core i5-124...</td>\n <td>$949.00</td>\n </tr>\n <tr>\n <th>14</th>\n <td>One Netbook T1 2 in 1 Laptop Intel Core i7-126...</td>\n <td>$1299.00</td>\n </tr>\n <tr>\n <th>15</th>\n <td>One Netbook T1 2 in 1 Laptop Intel Core i7-126...</td>\n <td>$1199.00</td>\n </tr>\n <tr>\n <th>16</th>\n <td>Logitech G502 HERO Wired Gaming Mouse 16000DPI...</td>\n <td>$73.99</td>\n </tr>\n <tr>\n <th>17</th>\n <td>P1 Webcam 1080P with Microphone Auto Focus Lig...</td>\n <td>$14.99</td>\n </tr>\n <tr>\n <th>18</th>\n <td>ONE Netbook OneXPlayer Mini Pro Game Console 7...</td>\n <td>$1099.00</td>\n </tr>\n <tr>\n <th>19</th>\n <td>One Netbook 4S Platinum 2 in 1 Laptop Intel Co...</td>\n <td>$1199.00</td>\n </tr>\n <tr>\n <th>20</th>\n <td>Second Screen for Laptop PC Phone Xbox-US Plug</td>\n <td>$99.99</td>\n </tr>\n <tr>\n <th>21</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1459.00</td>\n </tr>\n <tr>\n <th>22</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1459.00</td>\n </tr>\n <tr>\n <th>23</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1459.00</td>\n </tr>\n <tr>\n <th>24</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1459.00</td>\n </tr>\n <tr>\n <th>25</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1459.00</td>\n </tr>\n <tr>\n <th>26</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1459.00</td>\n </tr>\n <tr>\n <th>27</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1259.00</td>\n </tr>\n <tr>\n <th>28</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1259.00</td>\n </tr>\n <tr>\n <th>29</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1259.00</td>\n </tr>\n <tr>\n <th>30</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1259.00</td>\n </tr>\n <tr>\n <th>31</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1259.00</td>\n </tr>\n <tr>\n <th>32</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1259.00</td>\n </tr>\n <tr>\n <th>33</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1259.00</td>\n </tr>\n <tr>\n <th>34</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1259.00</td>\n </tr>\n <tr>\n <th>35</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1259.00</td>\n </tr>\n <tr>\n <th>36</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>$1259.00</td>\n </tr>\n <tr>\n <th>37</th>\n <td>GPD WIN Max 2 Smallest Handheld Gaming Laptop ...</td>\n <td>$1142.99</td>\n </tr>\n <tr>\n <th>38</th>\n <td>GPD MicroPC Pocket Laptop Mini PC 6 Inch Scree...</td>\n <td>$484.99</td>\n </tr>\n <tr>\n <th>39</th>\n <td>Kingston A400 SSD 960GB SATA 3 2.5 Inch Solid ...</td>\n <td>$139.99</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Names Prices\n0 Game Monitor Second Screen for Laptop PC Phone... $99.99\n1 ALLDOCUBE GTBook 15 Laptop, 15.6 inch FHD IPS ... $299.99\n2 BMAX X15 Plus 15.6'' 1080P Laptop Intel Jasper... $319.00\n3 Type-C Hub 10 in 1 USB C to 4K HDMI+RJ45+PD 10... $29.99\n4 One-Netbook T1 2 in 1 Laptop Protective Bag $19.99\n5 ALLDOCUBE GTBook 15 Laptop, 15.6 inch FHD IPS ... $349.99\n6 One Netbook 4S Platinum 2 in 1 Laptop Intel Co... $1199.00\n7 GPD WIN Max 2 Smallest Handheld Gaming Laptop ... $1142.99\n8 GPD Pocket 3 Laptop Mini Tablet PC 8 Inch 1920... $1059.99\n9 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1259.00\n10 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1259.00\n11 One Netbook T1 2 in 1 Laptop Intel Pentium Gol... $799.00\n12 One Netbook T1 2 in 1 Laptop Intel Core i5-124... $1099.00\n13 One Netbook T1 2 in 1 Laptop Intel Core i5-124... $949.00\n14 One Netbook T1 2 in 1 Laptop Intel Core i7-126... $1299.00\n15 One Netbook T1 2 in 1 Laptop Intel Core i7-126... $1199.00\n16 Logitech G502 HERO Wired Gaming Mouse 16000DPI... $73.99\n17 P1 Webcam 1080P with Microphone Auto Focus Lig... $14.99\n18 ONE Netbook OneXPlayer Mini Pro Game Console 7... $1099.00\n19 One Netbook 4S Platinum 2 in 1 Laptop Intel Co... $1199.00\n20 Second Screen for Laptop PC Phone Xbox-US Plug $99.99\n21 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1459.00\n22 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1459.00\n23 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1459.00\n24 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1459.00\n25 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1459.00\n26 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1459.00\n27 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1259.00\n28 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1259.00\n29 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1259.00\n30 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1259.00\n31 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1259.00\n32 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1259.00\n33 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1259.00\n34 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1259.00\n35 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1259.00\n36 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... $1259.00\n37 GPD WIN Max 2 Smallest Handheld Gaming Laptop ... $1142.99\n38 GPD MicroPC Pocket Laptop Mini PC 6 Inch Scree... $484.99\n39 Kingston A400 SSD 960GB SATA 3 2.5 Inch Solid ... $139.99"
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {},
"id": "dacf0b29",
"cell_type": "markdown",
"source": "- Great! now we can store this into an excel sheet and later on\n- We can import it into our \"laptop\" table in our databse.\n- But first, Prices column needs to be \"cleaned\" a bit, notice the dollar sign \"$\" symbol?\n- We need to make sure that the Prices column contains only numerical values (decimals),\n- Wo that, when we import it into our database's laptop table \n- (which would probably contain a column names \"prices\" that only accept decimal data values)\n- It would be compatiple with this column."
},
{
"metadata": {
"trusted": false
},
"id": "7c79d2e7",
"cell_type": "code",
"source": "# apply method goes across every row and applies the one liner (lambda) function against them \n# (replaces \"$\" by an empty string)\ndf[\"Prices\"] = df[\"Prices\"].apply(lambda s: s.replace(\"$\", \"\") )",
"execution_count": 52,
"outputs": []
},
{
"metadata": {
"trusted": false
},
"id": "66ea4aca",
"cell_type": "code",
"source": "# check the new table's Prices column\ndf",
"execution_count": 53,
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Names</th>\n <th>Prices</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Game Monitor Second Screen for Laptop PC Phone...</td>\n <td>99.99</td>\n </tr>\n <tr>\n <th>1</th>\n <td>ALLDOCUBE GTBook 15 Laptop, 15.6 inch FHD IPS ...</td>\n <td>299.99</td>\n </tr>\n <tr>\n <th>2</th>\n <td>BMAX X15 Plus 15.6'' 1080P Laptop Intel Jasper...</td>\n <td>319.00</td>\n </tr>\n <tr>\n <th>3</th>\n <td>Type-C Hub 10 in 1 USB C to 4K HDMI+RJ45+PD 10...</td>\n <td>29.99</td>\n </tr>\n <tr>\n <th>4</th>\n <td>One-Netbook T1 2 in 1 Laptop Protective Bag</td>\n <td>19.99</td>\n </tr>\n <tr>\n <th>5</th>\n <td>ALLDOCUBE GTBook 15 Laptop, 15.6 inch FHD IPS ...</td>\n <td>349.99</td>\n </tr>\n <tr>\n <th>6</th>\n <td>One Netbook 4S Platinum 2 in 1 Laptop Intel Co...</td>\n <td>1199.00</td>\n </tr>\n <tr>\n <th>7</th>\n <td>GPD WIN Max 2 Smallest Handheld Gaming Laptop ...</td>\n <td>1142.99</td>\n </tr>\n <tr>\n <th>8</th>\n <td>GPD Pocket 3 Laptop Mini Tablet PC 8 Inch 1920...</td>\n <td>1059.99</td>\n </tr>\n <tr>\n <th>9</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1259.00</td>\n </tr>\n <tr>\n <th>10</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1259.00</td>\n </tr>\n <tr>\n <th>11</th>\n <td>One Netbook T1 2 in 1 Laptop Intel Pentium Gol...</td>\n <td>799.00</td>\n </tr>\n <tr>\n <th>12</th>\n <td>One Netbook T1 2 in 1 Laptop Intel Core i5-124...</td>\n <td>1099.00</td>\n </tr>\n <tr>\n <th>13</th>\n <td>One Netbook T1 2 in 1 Laptop Intel Core i5-124...</td>\n <td>949.00</td>\n </tr>\n <tr>\n <th>14</th>\n <td>One Netbook T1 2 in 1 Laptop Intel Core i7-126...</td>\n <td>1299.00</td>\n </tr>\n <tr>\n <th>15</th>\n <td>One Netbook T1 2 in 1 Laptop Intel Core i7-126...</td>\n <td>1199.00</td>\n </tr>\n <tr>\n <th>16</th>\n <td>Logitech G502 HERO Wired Gaming Mouse 16000DPI...</td>\n <td>73.99</td>\n </tr>\n <tr>\n <th>17</th>\n <td>P1 Webcam 1080P with Microphone Auto Focus Lig...</td>\n <td>14.99</td>\n </tr>\n <tr>\n <th>18</th>\n <td>ONE Netbook OneXPlayer Mini Pro Game Console 7...</td>\n <td>1099.00</td>\n </tr>\n <tr>\n <th>19</th>\n <td>One Netbook 4S Platinum 2 in 1 Laptop Intel Co...</td>\n <td>1199.00</td>\n </tr>\n <tr>\n <th>20</th>\n <td>Second Screen for Laptop PC Phone Xbox-US Plug</td>\n <td>99.99</td>\n </tr>\n <tr>\n <th>21</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1459.00</td>\n </tr>\n <tr>\n <th>22</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1459.00</td>\n </tr>\n <tr>\n <th>23</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1459.00</td>\n </tr>\n <tr>\n <th>24</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1459.00</td>\n </tr>\n <tr>\n <th>25</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1459.00</td>\n </tr>\n <tr>\n <th>26</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1459.00</td>\n </tr>\n <tr>\n <th>27</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1259.00</td>\n </tr>\n <tr>\n <th>28</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1259.00</td>\n </tr>\n <tr>\n <th>29</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1259.00</td>\n </tr>\n <tr>\n <th>30</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1259.00</td>\n </tr>\n <tr>\n <th>31</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1259.00</td>\n </tr>\n <tr>\n <th>32</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1259.00</td>\n </tr>\n <tr>\n <th>33</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1259.00</td>\n </tr>\n <tr>\n <th>34</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1259.00</td>\n </tr>\n <tr>\n <th>35</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1259.00</td>\n </tr>\n <tr>\n <th>36</th>\n <td>ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K...</td>\n <td>1259.00</td>\n </tr>\n <tr>\n <th>37</th>\n <td>GPD WIN Max 2 Smallest Handheld Gaming Laptop ...</td>\n <td>1142.99</td>\n </tr>\n <tr>\n <th>38</th>\n <td>GPD MicroPC Pocket Laptop Mini PC 6 Inch Scree...</td>\n <td>484.99</td>\n </tr>\n <tr>\n <th>39</th>\n <td>Kingston A400 SSD 960GB SATA 3 2.5 Inch Solid ...</td>\n <td>139.99</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " Names Prices\n0 Game Monitor Second Screen for Laptop PC Phone... 99.99\n1 ALLDOCUBE GTBook 15 Laptop, 15.6 inch FHD IPS ... 299.99\n2 BMAX X15 Plus 15.6'' 1080P Laptop Intel Jasper... 319.00\n3 Type-C Hub 10 in 1 USB C to 4K HDMI+RJ45+PD 10... 29.99\n4 One-Netbook T1 2 in 1 Laptop Protective Bag 19.99\n5 ALLDOCUBE GTBook 15 Laptop, 15.6 inch FHD IPS ... 349.99\n6 One Netbook 4S Platinum 2 in 1 Laptop Intel Co... 1199.00\n7 GPD WIN Max 2 Smallest Handheld Gaming Laptop ... 1142.99\n8 GPD Pocket 3 Laptop Mini Tablet PC 8 Inch 1920... 1059.99\n9 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1259.00\n10 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1259.00\n11 One Netbook T1 2 in 1 Laptop Intel Pentium Gol... 799.00\n12 One Netbook T1 2 in 1 Laptop Intel Core i5-124... 1099.00\n13 One Netbook T1 2 in 1 Laptop Intel Core i5-124... 949.00\n14 One Netbook T1 2 in 1 Laptop Intel Core i7-126... 1299.00\n15 One Netbook T1 2 in 1 Laptop Intel Core i7-126... 1199.00\n16 Logitech G502 HERO Wired Gaming Mouse 16000DPI... 73.99\n17 P1 Webcam 1080P with Microphone Auto Focus Lig... 14.99\n18 ONE Netbook OneXPlayer Mini Pro Game Console 7... 1099.00\n19 One Netbook 4S Platinum 2 in 1 Laptop Intel Co... 1199.00\n20 Second Screen for Laptop PC Phone Xbox-US Plug 99.99\n21 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1459.00\n22 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1459.00\n23 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1459.00\n24 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1459.00\n25 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1459.00\n26 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1459.00\n27 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1259.00\n28 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1259.00\n29 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1259.00\n30 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1259.00\n31 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1259.00\n32 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1259.00\n33 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1259.00\n34 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1259.00\n35 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1259.00\n36 ONE Netbook OneXPlayer 2 Game Console 8.4'' 2K... 1259.00\n37 GPD WIN Max 2 Smallest Handheld Gaming Laptop ... 1142.99\n38 GPD MicroPC Pocket Laptop Mini PC 6 Inch Scree... 484.99\n39 Kingston A400 SSD 960GB SATA 3 2.5 Inch Solid ... 139.99"
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": false
},
"id": "fb9923fa",
"cell_type": "code",
"source": "# optional\n# if lambdas were confusing, these two lines are identical alternatives\ndef replace(s):\n return s.replace(\"$\",\"\")\n\ndf[\"Prices\"] = df[\"Prices\"].apply(replace)\n",
"execution_count": null,
"outputs": []
},
{
"metadata": {},
"id": "ea2a7e43",
"cell_type": "markdown",
"source": "## Part 5: Storing Cleaned data into an excel sheet "
},
{
"metadata": {
"trusted": false
},
"id": "2da12245",
"cell_type": "code",
"source": "# to exclude the built-in default index column that the dataframes introduces, we set the index parameter with a False value\ndf.to_excel(\"laptops.xlsx\", index = False)",
"execution_count": 54,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.8.8",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"gist": {
"id": "",
"data": {
"description": "Desktop/Desktio_2023_Post_S1/DB_Lab1/WebScraping_InLab.ipynb",
"public": true
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}
@ImCosmoboy
Copy link

Thank You so much, this is amazing. I needed this so much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment