Skip to content

Instantly share code, notes, and snippets.

@sahilsunny
Last active January 28, 2019 09:20
Show Gist options
  • Save sahilsunny/60372cb8fc43ff4550648833b04d378f to your computer and use it in GitHub Desktop.
Save sahilsunny/60372cb8fc43ff4550648833b04d378f to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"In this guided DataQuest project, we'll be wowrking with a dataset from eBay Kleinanzeigen (a classifieds section of the German eBay website) of used cars.\n",
"\n",
"The original data was scraped and uploaded to Kaggle. DataQuest made a few modifications to the dataset though, which include:\n",
"\n",
"- Sampling 50,000 data points from the full dataset, to ensure the code runs quickly in hosted environment\n",
"- Making the data dirtier in order to more closely resemble what you would expect from a scraped dataset\n",
" - The original uploaded to Kaggle had been cleaned\n",
" \n",
"Below is the data dictionary from the data:\n",
"\n",
"- dateCrawled: when the ad was first crawled; all field-values are taken from this data\n",
"- name: name of the car\n",
"- seller: whether the seller is private or a dealer\n",
"- offerType: the type of listing\n",
"- price: price on the add to sell the car\n",
"- abtest: whether the listing is included in an A/B test\n",
"- vehicleType: the vehicle type\n",
"- yearOfRegistration: year in which the car was first registered\n",
"- gearbox: transmission type\n",
"- powerPS: power of the car in PS\n",
"- model: car model name\n",
"- kilometer: number of kilometers the car has driven\n",
"- monthOfRegistration: month in which the car was first registered\n",
"- fuelType: type of fuel the car uses\n",
"- brand: brand of the car\n",
"- notRepairedDamage: if the car has damage which is not yet repaired\n",
"- dateCreated: data on which the eBay listing was created\n",
"- nrOfPictures: number of pictures in the ad\n",
"- postalCode: postal code for the location of the vehicle\n",
"- lastSeenOnline: when the crawler saw this ad last online\n",
"\n",
"The primary aim of this project is to clean the data and then analyze the used car listings to see what insights we might be able to gather from it. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# import pandas and numpy libraries\n",
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# read csv file into autos using read_csv\n",
"autos = pd.read_csv('autos.csv', encoding='Latin-1')\n",
"\n",
"# create a test df for future use\n",
"autos_test = autos"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dateCrawled</th>\n",
" <th>name</th>\n",
" <th>seller</th>\n",
" <th>offerType</th>\n",
" <th>price</th>\n",
" <th>abtest</th>\n",
" <th>vehicleType</th>\n",
" <th>yearOfRegistration</th>\n",
" <th>gearbox</th>\n",
" <th>powerPS</th>\n",
" <th>model</th>\n",
" <th>odometer</th>\n",
" <th>monthOfRegistration</th>\n",
" <th>fuelType</th>\n",
" <th>brand</th>\n",
" <th>notRepairedDamage</th>\n",
" <th>dateCreated</th>\n",
" <th>nrOfPictures</th>\n",
" <th>postalCode</th>\n",
" <th>lastSeen</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2016-03-26 17:47:46</td>\n",
" <td>Peugeot_807_160_NAVTECH_ON_BOARD</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$5,000</td>\n",
" <td>control</td>\n",
" <td>bus</td>\n",
" <td>2004</td>\n",
" <td>manuell</td>\n",
" <td>158</td>\n",
" <td>andere</td>\n",
" <td>150,000km</td>\n",
" <td>3</td>\n",
" <td>lpg</td>\n",
" <td>peugeot</td>\n",
" <td>nein</td>\n",
" <td>2016-03-26 00:00:00</td>\n",
" <td>0</td>\n",
" <td>79588</td>\n",
" <td>2016-04-06 06:45:54</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2016-04-04 13:38:56</td>\n",
" <td>BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$8,500</td>\n",
" <td>control</td>\n",
" <td>limousine</td>\n",
" <td>1997</td>\n",
" <td>automatik</td>\n",
" <td>286</td>\n",
" <td>7er</td>\n",
" <td>150,000km</td>\n",
" <td>6</td>\n",
" <td>benzin</td>\n",
" <td>bmw</td>\n",
" <td>nein</td>\n",
" <td>2016-04-04 00:00:00</td>\n",
" <td>0</td>\n",
" <td>71034</td>\n",
" <td>2016-04-06 14:45:08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2016-03-26 18:57:24</td>\n",
" <td>Volkswagen_Golf_1.6_United</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$8,990</td>\n",
" <td>test</td>\n",
" <td>limousine</td>\n",
" <td>2009</td>\n",
" <td>manuell</td>\n",
" <td>102</td>\n",
" <td>golf</td>\n",
" <td>70,000km</td>\n",
" <td>7</td>\n",
" <td>benzin</td>\n",
" <td>volkswagen</td>\n",
" <td>nein</td>\n",
" <td>2016-03-26 00:00:00</td>\n",
" <td>0</td>\n",
" <td>35394</td>\n",
" <td>2016-04-06 20:15:37</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2016-03-12 16:58:10</td>\n",
" <td>Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$4,350</td>\n",
" <td>control</td>\n",
" <td>kleinwagen</td>\n",
" <td>2007</td>\n",
" <td>automatik</td>\n",
" <td>71</td>\n",
" <td>fortwo</td>\n",
" <td>70,000km</td>\n",
" <td>6</td>\n",
" <td>benzin</td>\n",
" <td>smart</td>\n",
" <td>nein</td>\n",
" <td>2016-03-12 00:00:00</td>\n",
" <td>0</td>\n",
" <td>33729</td>\n",
" <td>2016-03-15 03:16:28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2016-04-01 14:38:50</td>\n",
" <td>Ford_Focus_1_6_Benzin_T√úV_neu_ist_sehr_gepfleg...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$1,350</td>\n",
" <td>test</td>\n",
" <td>kombi</td>\n",
" <td>2003</td>\n",
" <td>manuell</td>\n",
" <td>0</td>\n",
" <td>focus</td>\n",
" <td>150,000km</td>\n",
" <td>7</td>\n",
" <td>benzin</td>\n",
" <td>ford</td>\n",
" <td>nein</td>\n",
" <td>2016-04-01 00:00:00</td>\n",
" <td>0</td>\n",
" <td>39218</td>\n",
" <td>2016-04-01 14:38:50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>2016-03-21 13:47:45</td>\n",
" <td>Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$7,900</td>\n",
" <td>test</td>\n",
" <td>bus</td>\n",
" <td>2006</td>\n",
" <td>automatik</td>\n",
" <td>150</td>\n",
" <td>voyager</td>\n",
" <td>150,000km</td>\n",
" <td>4</td>\n",
" <td>diesel</td>\n",
" <td>chrysler</td>\n",
" <td>NaN</td>\n",
" <td>2016-03-21 00:00:00</td>\n",
" <td>0</td>\n",
" <td>22962</td>\n",
" <td>2016-04-06 09:45:21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2016-03-20 17:55:21</td>\n",
" <td>VW_Golf_III_GT_Special_Electronic_Green_Metall...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$300</td>\n",
" <td>test</td>\n",
" <td>limousine</td>\n",
" <td>1995</td>\n",
" <td>manuell</td>\n",
" <td>90</td>\n",
" <td>golf</td>\n",
" <td>150,000km</td>\n",
" <td>8</td>\n",
" <td>benzin</td>\n",
" <td>volkswagen</td>\n",
" <td>NaN</td>\n",
" <td>2016-03-20 00:00:00</td>\n",
" <td>0</td>\n",
" <td>31535</td>\n",
" <td>2016-03-23 02:48:59</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2016-03-16 18:55:19</td>\n",
" <td>Golf_IV_1.9_TDI_90PS</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$1,990</td>\n",
" <td>control</td>\n",
" <td>limousine</td>\n",
" <td>1998</td>\n",
" <td>manuell</td>\n",
" <td>90</td>\n",
" <td>golf</td>\n",
" <td>150,000km</td>\n",
" <td>12</td>\n",
" <td>diesel</td>\n",
" <td>volkswagen</td>\n",
" <td>nein</td>\n",
" <td>2016-03-16 00:00:00</td>\n",
" <td>0</td>\n",
" <td>53474</td>\n",
" <td>2016-04-07 03:17:32</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>2016-03-22 16:51:34</td>\n",
" <td>Seat_Arosa</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$250</td>\n",
" <td>test</td>\n",
" <td>NaN</td>\n",
" <td>2000</td>\n",
" <td>manuell</td>\n",
" <td>0</td>\n",
" <td>arosa</td>\n",
" <td>150,000km</td>\n",
" <td>10</td>\n",
" <td>NaN</td>\n",
" <td>seat</td>\n",
" <td>nein</td>\n",
" <td>2016-03-22 00:00:00</td>\n",
" <td>0</td>\n",
" <td>7426</td>\n",
" <td>2016-03-26 18:18:10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>2016-03-16 13:47:02</td>\n",
" <td>Renault_Megane_Scenic_1.6e_RT_Klimaanlage</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$590</td>\n",
" <td>control</td>\n",
" <td>bus</td>\n",
" <td>1997</td>\n",
" <td>manuell</td>\n",
" <td>90</td>\n",
" <td>megane</td>\n",
" <td>150,000km</td>\n",
" <td>7</td>\n",
" <td>benzin</td>\n",
" <td>renault</td>\n",
" <td>nein</td>\n",
" <td>2016-03-16 00:00:00</td>\n",
" <td>0</td>\n",
" <td>15749</td>\n",
" <td>2016-04-06 10:46:35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>2016-03-15 01:41:36</td>\n",
" <td>VW_Golf_Tuning_in_siber/grau</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$999</td>\n",
" <td>test</td>\n",
" <td>NaN</td>\n",
" <td>2017</td>\n",
" <td>manuell</td>\n",
" <td>90</td>\n",
" <td>NaN</td>\n",
" <td>150,000km</td>\n",
" <td>4</td>\n",
" <td>benzin</td>\n",
" <td>volkswagen</td>\n",
" <td>nein</td>\n",
" <td>2016-03-14 00:00:00</td>\n",
" <td>0</td>\n",
" <td>86157</td>\n",
" <td>2016-04-07 03:16:21</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>2016-03-16 18:45:34</td>\n",
" <td>Mercedes_A140_Motorschaden</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$350</td>\n",
" <td>control</td>\n",
" <td>NaN</td>\n",
" <td>2000</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>150,000km</td>\n",
" <td>0</td>\n",
" <td>benzin</td>\n",
" <td>mercedes_benz</td>\n",
" <td>NaN</td>\n",
" <td>2016-03-16 00:00:00</td>\n",
" <td>0</td>\n",
" <td>17498</td>\n",
" <td>2016-03-16 18:45:34</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>2016-03-31 19:48:22</td>\n",
" <td>Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$5,299</td>\n",
" <td>control</td>\n",
" <td>kleinwagen</td>\n",
" <td>2010</td>\n",
" <td>automatik</td>\n",
" <td>71</td>\n",
" <td>fortwo</td>\n",
" <td>50,000km</td>\n",
" <td>9</td>\n",
" <td>benzin</td>\n",
" <td>smart</td>\n",
" <td>nein</td>\n",
" <td>2016-03-31 00:00:00</td>\n",
" <td>0</td>\n",
" <td>34590</td>\n",
" <td>2016-04-06 14:17:52</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>2016-03-23 10:48:32</td>\n",
" <td>Audi_A3_1.6_tuning</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$1,350</td>\n",
" <td>control</td>\n",
" <td>limousine</td>\n",
" <td>1999</td>\n",
" <td>manuell</td>\n",
" <td>101</td>\n",
" <td>a3</td>\n",
" <td>150,000km</td>\n",
" <td>11</td>\n",
" <td>benzin</td>\n",
" <td>audi</td>\n",
" <td>nein</td>\n",
" <td>2016-03-23 00:00:00</td>\n",
" <td>0</td>\n",
" <td>12043</td>\n",
" <td>2016-04-01 14:17:13</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>2016-03-23 11:50:46</td>\n",
" <td>Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$3,999</td>\n",
" <td>test</td>\n",
" <td>kleinwagen</td>\n",
" <td>2007</td>\n",
" <td>manuell</td>\n",
" <td>75</td>\n",
" <td>clio</td>\n",
" <td>150,000km</td>\n",
" <td>9</td>\n",
" <td>benzin</td>\n",
" <td>renault</td>\n",
" <td>NaN</td>\n",
" <td>2016-03-23 00:00:00</td>\n",
" <td>0</td>\n",
" <td>81737</td>\n",
" <td>2016-04-01 15:46:47</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>2016-04-01 12:06:20</td>\n",
" <td>Corvette_C3_Coupe_T_Top_Crossfire_Injection</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$18,900</td>\n",
" <td>test</td>\n",
" <td>coupe</td>\n",
" <td>1982</td>\n",
" <td>automatik</td>\n",
" <td>203</td>\n",
" <td>NaN</td>\n",
" <td>80,000km</td>\n",
" <td>6</td>\n",
" <td>benzin</td>\n",
" <td>sonstige_autos</td>\n",
" <td>nein</td>\n",
" <td>2016-04-01 00:00:00</td>\n",
" <td>0</td>\n",
" <td>61276</td>\n",
" <td>2016-04-02 21:10:48</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>2016-03-16 14:59:02</td>\n",
" <td>Opel_Vectra_B_Kombi</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$350</td>\n",
" <td>test</td>\n",
" <td>kombi</td>\n",
" <td>1999</td>\n",
" <td>manuell</td>\n",
" <td>101</td>\n",
" <td>vectra</td>\n",
" <td>150,000km</td>\n",
" <td>5</td>\n",
" <td>benzin</td>\n",
" <td>opel</td>\n",
" <td>nein</td>\n",
" <td>2016-03-16 00:00:00</td>\n",
" <td>0</td>\n",
" <td>57299</td>\n",
" <td>2016-03-18 05:29:37</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>2016-03-29 11:46:22</td>\n",
" <td>Volkswagen_Scirocco_2_G60</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$5,500</td>\n",
" <td>test</td>\n",
" <td>coupe</td>\n",
" <td>1990</td>\n",
" <td>manuell</td>\n",
" <td>205</td>\n",
" <td>scirocco</td>\n",
" <td>150,000km</td>\n",
" <td>6</td>\n",
" <td>benzin</td>\n",
" <td>volkswagen</td>\n",
" <td>nein</td>\n",
" <td>2016-03-29 00:00:00</td>\n",
" <td>0</td>\n",
" <td>74821</td>\n",
" <td>2016-04-05 20:46:26</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>2016-03-26 19:57:44</td>\n",
" <td>Verkaufen_mein_bmw_e36_320_i_touring</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$300</td>\n",
" <td>control</td>\n",
" <td>bus</td>\n",
" <td>1995</td>\n",
" <td>manuell</td>\n",
" <td>150</td>\n",
" <td>3er</td>\n",
" <td>150,000km</td>\n",
" <td>0</td>\n",
" <td>benzin</td>\n",
" <td>bmw</td>\n",
" <td>NaN</td>\n",
" <td>2016-03-26 00:00:00</td>\n",
" <td>0</td>\n",
" <td>54329</td>\n",
" <td>2016-04-02 12:16:41</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>2016-03-17 13:36:21</td>\n",
" <td>mazda_tribute_2.0_mit_gas_und_tuev_neu_2018</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$4,150</td>\n",
" <td>control</td>\n",
" <td>suv</td>\n",
" <td>2004</td>\n",
" <td>manuell</td>\n",
" <td>124</td>\n",
" <td>andere</td>\n",
" <td>150,000km</td>\n",
" <td>2</td>\n",
" <td>lpg</td>\n",
" <td>mazda</td>\n",
" <td>nein</td>\n",
" <td>2016-03-17 00:00:00</td>\n",
" <td>0</td>\n",
" <td>40878</td>\n",
" <td>2016-03-17 14:45:58</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>2016-03-05 19:57:31</td>\n",
" <td>Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$3,500</td>\n",
" <td>test</td>\n",
" <td>kombi</td>\n",
" <td>2003</td>\n",
" <td>manuell</td>\n",
" <td>131</td>\n",
" <td>a4</td>\n",
" <td>150,000km</td>\n",
" <td>5</td>\n",
" <td>diesel</td>\n",
" <td>audi</td>\n",
" <td>NaN</td>\n",
" <td>2016-03-05 00:00:00</td>\n",
" <td>0</td>\n",
" <td>53913</td>\n",
" <td>2016-03-07 05:46:46</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>2016-03-06 19:07:10</td>\n",
" <td>Porsche_911_Carrera_4S_Cabrio</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$41,500</td>\n",
" <td>test</td>\n",
" <td>cabrio</td>\n",
" <td>2004</td>\n",
" <td>manuell</td>\n",
" <td>320</td>\n",
" <td>911</td>\n",
" <td>150,000km</td>\n",
" <td>4</td>\n",
" <td>benzin</td>\n",
" <td>porsche</td>\n",
" <td>nein</td>\n",
" <td>2016-03-06 00:00:00</td>\n",
" <td>0</td>\n",
" <td>65428</td>\n",
" <td>2016-04-05 23:46:19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>2016-03-28 20:50:54</td>\n",
" <td>MINI_Cooper_S_Cabrio</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$25,450</td>\n",
" <td>control</td>\n",
" <td>cabrio</td>\n",
" <td>2015</td>\n",
" <td>manuell</td>\n",
" <td>184</td>\n",
" <td>cooper</td>\n",
" <td>10,000km</td>\n",
" <td>1</td>\n",
" <td>benzin</td>\n",
" <td>mini</td>\n",
" <td>nein</td>\n",
" <td>2016-03-28 00:00:00</td>\n",
" <td>0</td>\n",
" <td>44789</td>\n",
" <td>2016-04-01 06:45:30</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>2016-03-10 19:55:34</td>\n",
" <td>Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$7,999</td>\n",
" <td>control</td>\n",
" <td>bus</td>\n",
" <td>2010</td>\n",
" <td>manuell</td>\n",
" <td>120</td>\n",
" <td>NaN</td>\n",
" <td>150,000km</td>\n",
" <td>2</td>\n",
" <td>diesel</td>\n",
" <td>peugeot</td>\n",
" <td>nein</td>\n",
" <td>2016-03-10 00:00:00</td>\n",
" <td>0</td>\n",
" <td>30900</td>\n",
" <td>2016-03-17 08:45:17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>2016-04-03 11:57:02</td>\n",
" <td>BMW_535i_xDrive_Sport_Aut.</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$48,500</td>\n",
" <td>control</td>\n",
" <td>limousine</td>\n",
" <td>2014</td>\n",
" <td>automatik</td>\n",
" <td>306</td>\n",
" <td>5er</td>\n",
" <td>30,000km</td>\n",
" <td>12</td>\n",
" <td>benzin</td>\n",
" <td>bmw</td>\n",
" <td>nein</td>\n",
" <td>2016-04-03 00:00:00</td>\n",
" <td>0</td>\n",
" <td>22547</td>\n",
" <td>2016-04-07 13:16:50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>2016-03-21 21:56:18</td>\n",
" <td>Ford_escort_kombi_an_bastler_mit_ghia_ausstattung</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$90</td>\n",
" <td>control</td>\n",
" <td>kombi</td>\n",
" <td>1996</td>\n",
" <td>manuell</td>\n",
" <td>116</td>\n",
" <td>NaN</td>\n",
" <td>150,000km</td>\n",
" <td>4</td>\n",
" <td>benzin</td>\n",
" <td>ford</td>\n",
" <td>ja</td>\n",
" <td>2016-03-21 00:00:00</td>\n",
" <td>0</td>\n",
" <td>27574</td>\n",
" <td>2016-04-01 05:16:49</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>2016-04-03 22:46:28</td>\n",
" <td>Volkswagen_Polo_Fox</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$777</td>\n",
" <td>control</td>\n",
" <td>kleinwagen</td>\n",
" <td>1992</td>\n",
" <td>manuell</td>\n",
" <td>54</td>\n",
" <td>polo</td>\n",
" <td>125,000km</td>\n",
" <td>2</td>\n",
" <td>benzin</td>\n",
" <td>volkswagen</td>\n",
" <td>nein</td>\n",
" <td>2016-04-03 00:00:00</td>\n",
" <td>0</td>\n",
" <td>38110</td>\n",
" <td>2016-04-05 23:46:48</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>2016-03-27 18:45:01</td>\n",
" <td>Hat_einer_Ahnung_mit_Ford_Galaxy_HILFE</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$0</td>\n",
" <td>control</td>\n",
" <td>NaN</td>\n",
" <td>2005</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>150,000km</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>ford</td>\n",
" <td>NaN</td>\n",
" <td>2016-03-27 00:00:00</td>\n",
" <td>0</td>\n",
" <td>66701</td>\n",
" <td>2016-03-27 18:45:01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>2016-03-19 21:56:19</td>\n",
" <td>MINI_Cooper_D</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$5,250</td>\n",
" <td>control</td>\n",
" <td>kleinwagen</td>\n",
" <td>2007</td>\n",
" <td>manuell</td>\n",
" <td>110</td>\n",
" <td>cooper</td>\n",
" <td>150,000km</td>\n",
" <td>7</td>\n",
" <td>diesel</td>\n",
" <td>mini</td>\n",
" <td>ja</td>\n",
" <td>2016-03-19 00:00:00</td>\n",
" <td>0</td>\n",
" <td>15745</td>\n",
" <td>2016-04-07 14:58:48</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>2016-04-02 12:45:44</td>\n",
" <td>Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$4,999</td>\n",
" <td>test</td>\n",
" <td>kombi</td>\n",
" <td>2004</td>\n",
" <td>automatik</td>\n",
" <td>204</td>\n",
" <td>e_klasse</td>\n",
" <td>150,000km</td>\n",
" <td>10</td>\n",
" <td>diesel</td>\n",
" <td>mercedes_benz</td>\n",
" <td>nein</td>\n",
" <td>2016-04-02 00:00:00</td>\n",
" <td>0</td>\n",
" <td>47638</td>\n",
" <td>2016-04-02 12:45:44</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49970</th>\n",
" <td>2016-03-21 22:47:37</td>\n",
" <td>c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$15,800</td>\n",
" <td>control</td>\n",
" <td>bus</td>\n",
" <td>2010</td>\n",
" <td>automatik</td>\n",
" <td>136</td>\n",
" <td>c4</td>\n",
" <td>60,000km</td>\n",
" <td>4</td>\n",
" <td>diesel</td>\n",
" <td>citroen</td>\n",
" <td>nein</td>\n",
" <td>2016-03-21 00:00:00</td>\n",
" <td>0</td>\n",
" <td>14947</td>\n",
" <td>2016-04-07 04:17:34</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49971</th>\n",
" <td>2016-03-29 14:54:12</td>\n",
" <td>W.Lupo_1.0</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$950</td>\n",
" <td>test</td>\n",
" <td>kleinwagen</td>\n",
" <td>2001</td>\n",
" <td>manuell</td>\n",
" <td>50</td>\n",
" <td>lupo</td>\n",
" <td>150,000km</td>\n",
" <td>4</td>\n",
" <td>benzin</td>\n",
" <td>volkswagen</td>\n",
" <td>nein</td>\n",
" <td>2016-03-29 00:00:00</td>\n",
" <td>0</td>\n",
" <td>65197</td>\n",
" <td>2016-03-29 20:41:51</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49972</th>\n",
" <td>2016-03-26 22:25:23</td>\n",
" <td>Mercedes_Benz_Vito_115_CDI_Extralang_Aut.</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$3,300</td>\n",
" <td>control</td>\n",
" <td>bus</td>\n",
" <td>2004</td>\n",
" <td>automatik</td>\n",
" <td>150</td>\n",
" <td>vito</td>\n",
" <td>150,000km</td>\n",
" <td>10</td>\n",
" <td>diesel</td>\n",
" <td>mercedes_benz</td>\n",
" <td>ja</td>\n",
" <td>2016-03-26 00:00:00</td>\n",
" <td>0</td>\n",
" <td>65326</td>\n",
" <td>2016-03-28 11:28:18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49973</th>\n",
" <td>2016-03-27 05:32:39</td>\n",
" <td>Mercedes_Benz_SLK_200_Kompressor</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$6,000</td>\n",
" <td>control</td>\n",
" <td>cabrio</td>\n",
" <td>2004</td>\n",
" <td>manuell</td>\n",
" <td>163</td>\n",
" <td>slk</td>\n",
" <td>150,000km</td>\n",
" <td>11</td>\n",
" <td>benzin</td>\n",
" <td>mercedes_benz</td>\n",
" <td>nein</td>\n",
" <td>2016-03-27 00:00:00</td>\n",
" <td>0</td>\n",
" <td>53567</td>\n",
" <td>2016-03-27 08:25:24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49974</th>\n",
" <td>2016-03-20 10:52:31</td>\n",
" <td>Golf_1_Cabrio_Tuev_Neu_viele_Extras_alles_eing...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$0</td>\n",
" <td>control</td>\n",
" <td>cabrio</td>\n",
" <td>1983</td>\n",
" <td>manuell</td>\n",
" <td>70</td>\n",
" <td>golf</td>\n",
" <td>150,000km</td>\n",
" <td>2</td>\n",
" <td>benzin</td>\n",
" <td>volkswagen</td>\n",
" <td>nein</td>\n",
" <td>2016-03-20 00:00:00</td>\n",
" <td>0</td>\n",
" <td>8209</td>\n",
" <td>2016-03-27 19:48:16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49975</th>\n",
" <td>2016-03-27 20:51:39</td>\n",
" <td>Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$9,700</td>\n",
" <td>control</td>\n",
" <td>kleinwagen</td>\n",
" <td>2012</td>\n",
" <td>automatik</td>\n",
" <td>88</td>\n",
" <td>jazz</td>\n",
" <td>100,000km</td>\n",
" <td>11</td>\n",
" <td>hybrid</td>\n",
" <td>honda</td>\n",
" <td>nein</td>\n",
" <td>2016-03-27 00:00:00</td>\n",
" <td>0</td>\n",
" <td>84385</td>\n",
" <td>2016-04-05 19:45:34</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49976</th>\n",
" <td>2016-03-19 18:56:05</td>\n",
" <td>Audi_80_Avant_2.6_E__Vollausstattung!!_Einziga...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$5,900</td>\n",
" <td>test</td>\n",
" <td>kombi</td>\n",
" <td>1992</td>\n",
" <td>automatik</td>\n",
" <td>150</td>\n",
" <td>80</td>\n",
" <td>150,000km</td>\n",
" <td>12</td>\n",
" <td>benzin</td>\n",
" <td>audi</td>\n",
" <td>nein</td>\n",
" <td>2016-03-19 00:00:00</td>\n",
" <td>0</td>\n",
" <td>36100</td>\n",
" <td>2016-04-07 06:16:44</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49977</th>\n",
" <td>2016-03-31 18:37:18</td>\n",
" <td>Mercedes_Benz_C200_Cdi_W203</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$5,500</td>\n",
" <td>control</td>\n",
" <td>limousine</td>\n",
" <td>2003</td>\n",
" <td>manuell</td>\n",
" <td>116</td>\n",
" <td>c_klasse</td>\n",
" <td>150,000km</td>\n",
" <td>2</td>\n",
" <td>diesel</td>\n",
" <td>mercedes_benz</td>\n",
" <td>nein</td>\n",
" <td>2016-03-31 00:00:00</td>\n",
" <td>0</td>\n",
" <td>33739</td>\n",
" <td>2016-04-06 12:16:11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49978</th>\n",
" <td>2016-04-04 10:37:14</td>\n",
" <td>Mercedes_Benz_E_200_Classic</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$900</td>\n",
" <td>control</td>\n",
" <td>limousine</td>\n",
" <td>1996</td>\n",
" <td>automatik</td>\n",
" <td>136</td>\n",
" <td>e_klasse</td>\n",
" <td>150,000km</td>\n",
" <td>9</td>\n",
" <td>benzin</td>\n",
" <td>mercedes_benz</td>\n",
" <td>ja</td>\n",
" <td>2016-04-04 00:00:00</td>\n",
" <td>0</td>\n",
" <td>24405</td>\n",
" <td>2016-04-06 12:44:20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49979</th>\n",
" <td>2016-03-20 18:38:40</td>\n",
" <td>Volkswagen_Polo_1.6_TDI_Style</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$11,000</td>\n",
" <td>test</td>\n",
" <td>kleinwagen</td>\n",
" <td>2011</td>\n",
" <td>manuell</td>\n",
" <td>90</td>\n",
" <td>polo</td>\n",
" <td>70,000km</td>\n",
" <td>11</td>\n",
" <td>diesel</td>\n",
" <td>volkswagen</td>\n",
" <td>nein</td>\n",
" <td>2016-03-20 00:00:00</td>\n",
" <td>0</td>\n",
" <td>48455</td>\n",
" <td>2016-04-07 01:45:12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49980</th>\n",
" <td>2016-03-12 10:55:54</td>\n",
" <td>Ford_Escort_Turnier_16V</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$400</td>\n",
" <td>control</td>\n",
" <td>kombi</td>\n",
" <td>1995</td>\n",
" <td>manuell</td>\n",
" <td>105</td>\n",
" <td>escort</td>\n",
" <td>125,000km</td>\n",
" <td>3</td>\n",
" <td>benzin</td>\n",
" <td>ford</td>\n",
" <td>NaN</td>\n",
" <td>2016-03-12 00:00:00</td>\n",
" <td>0</td>\n",
" <td>56218</td>\n",
" <td>2016-04-06 17:16:49</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49981</th>\n",
" <td>2016-03-15 09:38:21</td>\n",
" <td>Opel_Astra_Kombi_mit_Anhaengerkupplung</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$2,000</td>\n",
" <td>control</td>\n",
" <td>kombi</td>\n",
" <td>1998</td>\n",
" <td>manuell</td>\n",
" <td>115</td>\n",
" <td>astra</td>\n",
" <td>150,000km</td>\n",
" <td>12</td>\n",
" <td>benzin</td>\n",
" <td>opel</td>\n",
" <td>nein</td>\n",
" <td>2016-03-15 00:00:00</td>\n",
" <td>0</td>\n",
" <td>86859</td>\n",
" <td>2016-04-05 17:21:46</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49982</th>\n",
" <td>2016-03-29 18:51:08</td>\n",
" <td>Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$1,950</td>\n",
" <td>control</td>\n",
" <td>kleinwagen</td>\n",
" <td>2004</td>\n",
" <td>manuell</td>\n",
" <td>0</td>\n",
" <td>fabia</td>\n",
" <td>90,000km</td>\n",
" <td>7</td>\n",
" <td>benzin</td>\n",
" <td>skoda</td>\n",
" <td>NaN</td>\n",
" <td>2016-03-29 00:00:00</td>\n",
" <td>0</td>\n",
" <td>45884</td>\n",
" <td>2016-03-29 18:51:08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49983</th>\n",
" <td>2016-03-06 12:43:04</td>\n",
" <td>Ford_focus_99</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$600</td>\n",
" <td>test</td>\n",
" <td>kleinwagen</td>\n",
" <td>1999</td>\n",
" <td>manuell</td>\n",
" <td>101</td>\n",
" <td>focus</td>\n",
" <td>150,000km</td>\n",
" <td>4</td>\n",
" <td>benzin</td>\n",
" <td>ford</td>\n",
" <td>NaN</td>\n",
" <td>2016-03-06 00:00:00</td>\n",
" <td>0</td>\n",
" <td>52477</td>\n",
" <td>2016-03-09 06:16:08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49984</th>\n",
" <td>2016-03-31 22:48:48</td>\n",
" <td>Student_sucht_ein__Anfaengerauto___ab_2000_BJ_...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$0</td>\n",
" <td>test</td>\n",
" <td>NaN</td>\n",
" <td>2000</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>150,000km</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>sonstige_autos</td>\n",
" <td>NaN</td>\n",
" <td>2016-03-31 00:00:00</td>\n",
" <td>0</td>\n",
" <td>12103</td>\n",
" <td>2016-04-02 19:44:53</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49985</th>\n",
" <td>2016-04-02 16:38:23</td>\n",
" <td>Verkaufe_meinen_vw_vento!</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$1,000</td>\n",
" <td>control</td>\n",
" <td>NaN</td>\n",
" <td>1995</td>\n",
" <td>automatik</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>150,000km</td>\n",
" <td>0</td>\n",
" <td>benzin</td>\n",
" <td>volkswagen</td>\n",
" <td>NaN</td>\n",
" <td>2016-04-02 00:00:00</td>\n",
" <td>0</td>\n",
" <td>30900</td>\n",
" <td>2016-04-06 15:17:52</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49986</th>\n",
" <td>2016-04-04 20:46:02</td>\n",
" <td>Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$15,900</td>\n",
" <td>control</td>\n",
" <td>limousine</td>\n",
" <td>2010</td>\n",
" <td>automatik</td>\n",
" <td>218</td>\n",
" <td>300c</td>\n",
" <td>125,000km</td>\n",
" <td>11</td>\n",
" <td>diesel</td>\n",
" <td>chrysler</td>\n",
" <td>nein</td>\n",
" <td>2016-04-04 00:00:00</td>\n",
" <td>0</td>\n",
" <td>73527</td>\n",
" <td>2016-04-06 23:16:00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49987</th>\n",
" <td>2016-03-22 20:47:27</td>\n",
" <td>Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$21,990</td>\n",
" <td>control</td>\n",
" <td>limousine</td>\n",
" <td>2013</td>\n",
" <td>manuell</td>\n",
" <td>150</td>\n",
" <td>a3</td>\n",
" <td>50,000km</td>\n",
" <td>11</td>\n",
" <td>diesel</td>\n",
" <td>audi</td>\n",
" <td>nein</td>\n",
" <td>2016-03-22 00:00:00</td>\n",
" <td>0</td>\n",
" <td>94362</td>\n",
" <td>2016-03-26 22:46:06</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49988</th>\n",
" <td>2016-03-28 19:49:51</td>\n",
" <td>BMW_330_Ci</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$9,550</td>\n",
" <td>control</td>\n",
" <td>coupe</td>\n",
" <td>2001</td>\n",
" <td>manuell</td>\n",
" <td>231</td>\n",
" <td>3er</td>\n",
" <td>150,000km</td>\n",
" <td>10</td>\n",
" <td>benzin</td>\n",
" <td>bmw</td>\n",
" <td>nein</td>\n",
" <td>2016-03-28 00:00:00</td>\n",
" <td>0</td>\n",
" <td>83646</td>\n",
" <td>2016-04-07 02:17:40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49989</th>\n",
" <td>2016-03-11 19:50:37</td>\n",
" <td>VW_Polo_zum_Ausschlachten_oder_Wiederaufbau</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$150</td>\n",
" <td>test</td>\n",
" <td>kleinwagen</td>\n",
" <td>1997</td>\n",
" <td>manuell</td>\n",
" <td>0</td>\n",
" <td>polo</td>\n",
" <td>150,000km</td>\n",
" <td>5</td>\n",
" <td>benzin</td>\n",
" <td>volkswagen</td>\n",
" <td>ja</td>\n",
" <td>2016-03-11 00:00:00</td>\n",
" <td>0</td>\n",
" <td>21244</td>\n",
" <td>2016-03-12 10:17:55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49990</th>\n",
" <td>2016-03-21 19:54:19</td>\n",
" <td>Mercedes_Benz_A_200__BlueEFFICIENCY__Urban</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$17,500</td>\n",
" <td>test</td>\n",
" <td>limousine</td>\n",
" <td>2012</td>\n",
" <td>manuell</td>\n",
" <td>156</td>\n",
" <td>a_klasse</td>\n",
" <td>30,000km</td>\n",
" <td>12</td>\n",
" <td>benzin</td>\n",
" <td>mercedes_benz</td>\n",
" <td>nein</td>\n",
" <td>2016-03-21 00:00:00</td>\n",
" <td>0</td>\n",
" <td>58239</td>\n",
" <td>2016-04-06 22:46:57</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49991</th>\n",
" <td>2016-03-06 15:25:19</td>\n",
" <td>Kleinwagen</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$500</td>\n",
" <td>control</td>\n",
" <td>NaN</td>\n",
" <td>2016</td>\n",
" <td>manuell</td>\n",
" <td>0</td>\n",
" <td>twingo</td>\n",
" <td>150,000km</td>\n",
" <td>0</td>\n",
" <td>benzin</td>\n",
" <td>renault</td>\n",
" <td>NaN</td>\n",
" <td>2016-03-06 00:00:00</td>\n",
" <td>0</td>\n",
" <td>61350</td>\n",
" <td>2016-03-06 18:24:19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49992</th>\n",
" <td>2016-03-10 19:37:38</td>\n",
" <td>Fiat_Grande_Punto_1.4_T_Jet_16V_Sport</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$4,800</td>\n",
" <td>control</td>\n",
" <td>kleinwagen</td>\n",
" <td>2009</td>\n",
" <td>manuell</td>\n",
" <td>120</td>\n",
" <td>andere</td>\n",
" <td>125,000km</td>\n",
" <td>9</td>\n",
" <td>lpg</td>\n",
" <td>fiat</td>\n",
" <td>nein</td>\n",
" <td>2016-03-10 00:00:00</td>\n",
" <td>0</td>\n",
" <td>68642</td>\n",
" <td>2016-03-13 01:44:51</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49993</th>\n",
" <td>2016-03-15 18:47:35</td>\n",
" <td>Audi_A3__1_8l__Silber;_schoenes_Fahrzeug</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$1,650</td>\n",
" <td>control</td>\n",
" <td>kleinwagen</td>\n",
" <td>1997</td>\n",
" <td>manuell</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>150,000km</td>\n",
" <td>7</td>\n",
" <td>benzin</td>\n",
" <td>audi</td>\n",
" <td>NaN</td>\n",
" <td>2016-03-15 00:00:00</td>\n",
" <td>0</td>\n",
" <td>65203</td>\n",
" <td>2016-04-06 19:46:53</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49994</th>\n",
" <td>2016-03-22 17:36:42</td>\n",
" <td>Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$5,000</td>\n",
" <td>control</td>\n",
" <td>kombi</td>\n",
" <td>2001</td>\n",
" <td>automatik</td>\n",
" <td>299</td>\n",
" <td>a6</td>\n",
" <td>150,000km</td>\n",
" <td>1</td>\n",
" <td>benzin</td>\n",
" <td>audi</td>\n",
" <td>nein</td>\n",
" <td>2016-03-22 00:00:00</td>\n",
" <td>0</td>\n",
" <td>46537</td>\n",
" <td>2016-04-06 08:16:39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49995</th>\n",
" <td>2016-03-27 14:38:19</td>\n",
" <td>Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$24,900</td>\n",
" <td>control</td>\n",
" <td>limousine</td>\n",
" <td>2011</td>\n",
" <td>automatik</td>\n",
" <td>239</td>\n",
" <td>q5</td>\n",
" <td>100,000km</td>\n",
" <td>1</td>\n",
" <td>diesel</td>\n",
" <td>audi</td>\n",
" <td>nein</td>\n",
" <td>2016-03-27 00:00:00</td>\n",
" <td>0</td>\n",
" <td>82131</td>\n",
" <td>2016-04-01 13:47:40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49996</th>\n",
" <td>2016-03-28 10:50:25</td>\n",
" <td>Opel_Astra_F_Cabrio_Bertone_Edition___T√úV_neu+...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$1,980</td>\n",
" <td>control</td>\n",
" <td>cabrio</td>\n",
" <td>1996</td>\n",
" <td>manuell</td>\n",
" <td>75</td>\n",
" <td>astra</td>\n",
" <td>150,000km</td>\n",
" <td>5</td>\n",
" <td>benzin</td>\n",
" <td>opel</td>\n",
" <td>nein</td>\n",
" <td>2016-03-28 00:00:00</td>\n",
" <td>0</td>\n",
" <td>44807</td>\n",
" <td>2016-04-02 14:18:02</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49997</th>\n",
" <td>2016-04-02 14:44:48</td>\n",
" <td>Fiat_500_C_1.2_Dualogic_Lounge</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$13,200</td>\n",
" <td>test</td>\n",
" <td>cabrio</td>\n",
" <td>2014</td>\n",
" <td>automatik</td>\n",
" <td>69</td>\n",
" <td>500</td>\n",
" <td>5,000km</td>\n",
" <td>11</td>\n",
" <td>benzin</td>\n",
" <td>fiat</td>\n",
" <td>nein</td>\n",
" <td>2016-04-02 00:00:00</td>\n",
" <td>0</td>\n",
" <td>73430</td>\n",
" <td>2016-04-04 11:47:27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49998</th>\n",
" <td>2016-03-08 19:25:42</td>\n",
" <td>Audi_A3_2.0_TDI_Sportback_Ambition</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$22,900</td>\n",
" <td>control</td>\n",
" <td>kombi</td>\n",
" <td>2013</td>\n",
" <td>manuell</td>\n",
" <td>150</td>\n",
" <td>a3</td>\n",
" <td>40,000km</td>\n",
" <td>11</td>\n",
" <td>diesel</td>\n",
" <td>audi</td>\n",
" <td>nein</td>\n",
" <td>2016-03-08 00:00:00</td>\n",
" <td>0</td>\n",
" <td>35683</td>\n",
" <td>2016-04-05 16:45:07</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49999</th>\n",
" <td>2016-03-14 00:42:12</td>\n",
" <td>Opel_Vectra_1.6_16V</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$1,250</td>\n",
" <td>control</td>\n",
" <td>limousine</td>\n",
" <td>1996</td>\n",
" <td>manuell</td>\n",
" <td>101</td>\n",
" <td>vectra</td>\n",
" <td>150,000km</td>\n",
" <td>1</td>\n",
" <td>benzin</td>\n",
" <td>opel</td>\n",
" <td>nein</td>\n",
" <td>2016-03-13 00:00:00</td>\n",
" <td>0</td>\n",
" <td>45897</td>\n",
" <td>2016-04-06 21:18:48</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>50000 rows √ó 20 columns</p>\n",
"</div>"
],
"text/plain": [
" dateCrawled name \\\n",
"0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD \n",
"1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik \n",
"2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United \n",
"3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... \n",
"4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_T√úV_neu_ist_sehr_gepfleg... \n",
"5 2016-03-21 13:47:45 Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto... \n",
"6 2016-03-20 17:55:21 VW_Golf_III_GT_Special_Electronic_Green_Metall... \n",
"7 2016-03-16 18:55:19 Golf_IV_1.9_TDI_90PS \n",
"8 2016-03-22 16:51:34 Seat_Arosa \n",
"9 2016-03-16 13:47:02 Renault_Megane_Scenic_1.6e_RT_Klimaanlage \n",
"10 2016-03-15 01:41:36 VW_Golf_Tuning_in_siber/grau \n",
"11 2016-03-16 18:45:34 Mercedes_A140_Motorschaden \n",
"12 2016-03-31 19:48:22 Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan... \n",
"13 2016-03-23 10:48:32 Audi_A3_1.6_tuning \n",
"14 2016-03-23 11:50:46 Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver... \n",
"15 2016-04-01 12:06:20 Corvette_C3_Coupe_T_Top_Crossfire_Injection \n",
"16 2016-03-16 14:59:02 Opel_Vectra_B_Kombi \n",
"17 2016-03-29 11:46:22 Volkswagen_Scirocco_2_G60 \n",
"18 2016-03-26 19:57:44 Verkaufen_mein_bmw_e36_320_i_touring \n",
"19 2016-03-17 13:36:21 mazda_tribute_2.0_mit_gas_und_tuev_neu_2018 \n",
"20 2016-03-05 19:57:31 Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*... \n",
"21 2016-03-06 19:07:10 Porsche_911_Carrera_4S_Cabrio \n",
"22 2016-03-28 20:50:54 MINI_Cooper_S_Cabrio \n",
"23 2016-03-10 19:55:34 Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima \n",
"24 2016-04-03 11:57:02 BMW_535i_xDrive_Sport_Aut. \n",
"25 2016-03-21 21:56:18 Ford_escort_kombi_an_bastler_mit_ghia_ausstattung \n",
"26 2016-04-03 22:46:28 Volkswagen_Polo_Fox \n",
"27 2016-03-27 18:45:01 Hat_einer_Ahnung_mit_Ford_Galaxy_HILFE \n",
"28 2016-03-19 21:56:19 MINI_Cooper_D \n",
"29 2016-04-02 12:45:44 Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz... \n",
"... ... ... \n",
"49970 2016-03-21 22:47:37 c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp... \n",
"49971 2016-03-29 14:54:12 W.Lupo_1.0 \n",
"49972 2016-03-26 22:25:23 Mercedes_Benz_Vito_115_CDI_Extralang_Aut. \n",
"49973 2016-03-27 05:32:39 Mercedes_Benz_SLK_200_Kompressor \n",
"49974 2016-03-20 10:52:31 Golf_1_Cabrio_Tuev_Neu_viele_Extras_alles_eing... \n",
"49975 2016-03-27 20:51:39 Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort \n",
"49976 2016-03-19 18:56:05 Audi_80_Avant_2.6_E__Vollausstattung!!_Einziga... \n",
"49977 2016-03-31 18:37:18 Mercedes_Benz_C200_Cdi_W203 \n",
"49978 2016-04-04 10:37:14 Mercedes_Benz_E_200_Classic \n",
"49979 2016-03-20 18:38:40 Volkswagen_Polo_1.6_TDI_Style \n",
"49980 2016-03-12 10:55:54 Ford_Escort_Turnier_16V \n",
"49981 2016-03-15 09:38:21 Opel_Astra_Kombi_mit_Anhaengerkupplung \n",
"49982 2016-03-29 18:51:08 Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm \n",
"49983 2016-03-06 12:43:04 Ford_focus_99 \n",
"49984 2016-03-31 22:48:48 Student_sucht_ein__Anfaengerauto___ab_2000_BJ_... \n",
"49985 2016-04-02 16:38:23 Verkaufe_meinen_vw_vento! \n",
"49986 2016-04-04 20:46:02 Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst... \n",
"49987 2016-03-22 20:47:27 Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__... \n",
"49988 2016-03-28 19:49:51 BMW_330_Ci \n",
"49989 2016-03-11 19:50:37 VW_Polo_zum_Ausschlachten_oder_Wiederaufbau \n",
"49990 2016-03-21 19:54:19 Mercedes_Benz_A_200__BlueEFFICIENCY__Urban \n",
"49991 2016-03-06 15:25:19 Kleinwagen \n",
"49992 2016-03-10 19:37:38 Fiat_Grande_Punto_1.4_T_Jet_16V_Sport \n",
"49993 2016-03-15 18:47:35 Audi_A3__1_8l__Silber;_schoenes_Fahrzeug \n",
"49994 2016-03-22 17:36:42 Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc... \n",
"49995 2016-03-27 14:38:19 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon \n",
"49996 2016-03-28 10:50:25 Opel_Astra_F_Cabrio_Bertone_Edition___T√úV_neu+... \n",
"49997 2016-04-02 14:44:48 Fiat_500_C_1.2_Dualogic_Lounge \n",
"49998 2016-03-08 19:25:42 Audi_A3_2.0_TDI_Sportback_Ambition \n",
"49999 2016-03-14 00:42:12 Opel_Vectra_1.6_16V \n",
"\n",
" seller offerType price abtest vehicleType yearOfRegistration \\\n",
"0 privat Angebot $5,000 control bus 2004 \n",
"1 privat Angebot $8,500 control limousine 1997 \n",
"2 privat Angebot $8,990 test limousine 2009 \n",
"3 privat Angebot $4,350 control kleinwagen 2007 \n",
"4 privat Angebot $1,350 test kombi 2003 \n",
"5 privat Angebot $7,900 test bus 2006 \n",
"6 privat Angebot $300 test limousine 1995 \n",
"7 privat Angebot $1,990 control limousine 1998 \n",
"8 privat Angebot $250 test NaN 2000 \n",
"9 privat Angebot $590 control bus 1997 \n",
"10 privat Angebot $999 test NaN 2017 \n",
"11 privat Angebot $350 control NaN 2000 \n",
"12 privat Angebot $5,299 control kleinwagen 2010 \n",
"13 privat Angebot $1,350 control limousine 1999 \n",
"14 privat Angebot $3,999 test kleinwagen 2007 \n",
"15 privat Angebot $18,900 test coupe 1982 \n",
"16 privat Angebot $350 test kombi 1999 \n",
"17 privat Angebot $5,500 test coupe 1990 \n",
"18 privat Angebot $300 control bus 1995 \n",
"19 privat Angebot $4,150 control suv 2004 \n",
"20 privat Angebot $3,500 test kombi 2003 \n",
"21 privat Angebot $41,500 test cabrio 2004 \n",
"22 privat Angebot $25,450 control cabrio 2015 \n",
"23 privat Angebot $7,999 control bus 2010 \n",
"24 privat Angebot $48,500 control limousine 2014 \n",
"25 privat Angebot $90 control kombi 1996 \n",
"26 privat Angebot $777 control kleinwagen 1992 \n",
"27 privat Angebot $0 control NaN 2005 \n",
"28 privat Angebot $5,250 control kleinwagen 2007 \n",
"29 privat Angebot $4,999 test kombi 2004 \n",
"... ... ... ... ... ... ... \n",
"49970 privat Angebot $15,800 control bus 2010 \n",
"49971 privat Angebot $950 test kleinwagen 2001 \n",
"49972 privat Angebot $3,300 control bus 2004 \n",
"49973 privat Angebot $6,000 control cabrio 2004 \n",
"49974 privat Angebot $0 control cabrio 1983 \n",
"49975 privat Angebot $9,700 control kleinwagen 2012 \n",
"49976 privat Angebot $5,900 test kombi 1992 \n",
"49977 privat Angebot $5,500 control limousine 2003 \n",
"49978 privat Angebot $900 control limousine 1996 \n",
"49979 privat Angebot $11,000 test kleinwagen 2011 \n",
"49980 privat Angebot $400 control kombi 1995 \n",
"49981 privat Angebot $2,000 control kombi 1998 \n",
"49982 privat Angebot $1,950 control kleinwagen 2004 \n",
"49983 privat Angebot $600 test kleinwagen 1999 \n",
"49984 privat Angebot $0 test NaN 2000 \n",
"49985 privat Angebot $1,000 control NaN 1995 \n",
"49986 privat Angebot $15,900 control limousine 2010 \n",
"49987 privat Angebot $21,990 control limousine 2013 \n",
"49988 privat Angebot $9,550 control coupe 2001 \n",
"49989 privat Angebot $150 test kleinwagen 1997 \n",
"49990 privat Angebot $17,500 test limousine 2012 \n",
"49991 privat Angebot $500 control NaN 2016 \n",
"49992 privat Angebot $4,800 control kleinwagen 2009 \n",
"49993 privat Angebot $1,650 control kleinwagen 1997 \n",
"49994 privat Angebot $5,000 control kombi 2001 \n",
"49995 privat Angebot $24,900 control limousine 2011 \n",
"49996 privat Angebot $1,980 control cabrio 1996 \n",
"49997 privat Angebot $13,200 test cabrio 2014 \n",
"49998 privat Angebot $22,900 control kombi 2013 \n",
"49999 privat Angebot $1,250 control limousine 1996 \n",
"\n",
" gearbox powerPS model odometer monthOfRegistration fuelType \\\n",
"0 manuell 158 andere 150,000km 3 lpg \n",
"1 automatik 286 7er 150,000km 6 benzin \n",
"2 manuell 102 golf 70,000km 7 benzin \n",
"3 automatik 71 fortwo 70,000km 6 benzin \n",
"4 manuell 0 focus 150,000km 7 benzin \n",
"5 automatik 150 voyager 150,000km 4 diesel \n",
"6 manuell 90 golf 150,000km 8 benzin \n",
"7 manuell 90 golf 150,000km 12 diesel \n",
"8 manuell 0 arosa 150,000km 10 NaN \n",
"9 manuell 90 megane 150,000km 7 benzin \n",
"10 manuell 90 NaN 150,000km 4 benzin \n",
"11 NaN 0 NaN 150,000km 0 benzin \n",
"12 automatik 71 fortwo 50,000km 9 benzin \n",
"13 manuell 101 a3 150,000km 11 benzin \n",
"14 manuell 75 clio 150,000km 9 benzin \n",
"15 automatik 203 NaN 80,000km 6 benzin \n",
"16 manuell 101 vectra 150,000km 5 benzin \n",
"17 manuell 205 scirocco 150,000km 6 benzin \n",
"18 manuell 150 3er 150,000km 0 benzin \n",
"19 manuell 124 andere 150,000km 2 lpg \n",
"20 manuell 131 a4 150,000km 5 diesel \n",
"21 manuell 320 911 150,000km 4 benzin \n",
"22 manuell 184 cooper 10,000km 1 benzin \n",
"23 manuell 120 NaN 150,000km 2 diesel \n",
"24 automatik 306 5er 30,000km 12 benzin \n",
"25 manuell 116 NaN 150,000km 4 benzin \n",
"26 manuell 54 polo 125,000km 2 benzin \n",
"27 NaN 0 NaN 150,000km 0 NaN \n",
"28 manuell 110 cooper 150,000km 7 diesel \n",
"29 automatik 204 e_klasse 150,000km 10 diesel \n",
"... ... ... ... ... ... ... \n",
"49970 automatik 136 c4 60,000km 4 diesel \n",
"49971 manuell 50 lupo 150,000km 4 benzin \n",
"49972 automatik 150 vito 150,000km 10 diesel \n",
"49973 manuell 163 slk 150,000km 11 benzin \n",
"49974 manuell 70 golf 150,000km 2 benzin \n",
"49975 automatik 88 jazz 100,000km 11 hybrid \n",
"49976 automatik 150 80 150,000km 12 benzin \n",
"49977 manuell 116 c_klasse 150,000km 2 diesel \n",
"49978 automatik 136 e_klasse 150,000km 9 benzin \n",
"49979 manuell 90 polo 70,000km 11 diesel \n",
"49980 manuell 105 escort 125,000km 3 benzin \n",
"49981 manuell 115 astra 150,000km 12 benzin \n",
"49982 manuell 0 fabia 90,000km 7 benzin \n",
"49983 manuell 101 focus 150,000km 4 benzin \n",
"49984 NaN 0 NaN 150,000km 0 NaN \n",
"49985 automatik 0 NaN 150,000km 0 benzin \n",
"49986 automatik 218 300c 125,000km 11 diesel \n",
"49987 manuell 150 a3 50,000km 11 diesel \n",
"49988 manuell 231 3er 150,000km 10 benzin \n",
"49989 manuell 0 polo 150,000km 5 benzin \n",
"49990 manuell 156 a_klasse 30,000km 12 benzin \n",
"49991 manuell 0 twingo 150,000km 0 benzin \n",
"49992 manuell 120 andere 125,000km 9 lpg \n",
"49993 manuell 0 NaN 150,000km 7 benzin \n",
"49994 automatik 299 a6 150,000km 1 benzin \n",
"49995 automatik 239 q5 100,000km 1 diesel \n",
"49996 manuell 75 astra 150,000km 5 benzin \n",
"49997 automatik 69 500 5,000km 11 benzin \n",
"49998 manuell 150 a3 40,000km 11 diesel \n",
"49999 manuell 101 vectra 150,000km 1 benzin \n",
"\n",
" brand notRepairedDamage dateCreated nrOfPictures \\\n",
"0 peugeot nein 2016-03-26 00:00:00 0 \n",
"1 bmw nein 2016-04-04 00:00:00 0 \n",
"2 volkswagen nein 2016-03-26 00:00:00 0 \n",
"3 smart nein 2016-03-12 00:00:00 0 \n",
"4 ford nein 2016-04-01 00:00:00 0 \n",
"5 chrysler NaN 2016-03-21 00:00:00 0 \n",
"6 volkswagen NaN 2016-03-20 00:00:00 0 \n",
"7 volkswagen nein 2016-03-16 00:00:00 0 \n",
"8 seat nein 2016-03-22 00:00:00 0 \n",
"9 renault nein 2016-03-16 00:00:00 0 \n",
"10 volkswagen nein 2016-03-14 00:00:00 0 \n",
"11 mercedes_benz NaN 2016-03-16 00:00:00 0 \n",
"12 smart nein 2016-03-31 00:00:00 0 \n",
"13 audi nein 2016-03-23 00:00:00 0 \n",
"14 renault NaN 2016-03-23 00:00:00 0 \n",
"15 sonstige_autos nein 2016-04-01 00:00:00 0 \n",
"16 opel nein 2016-03-16 00:00:00 0 \n",
"17 volkswagen nein 2016-03-29 00:00:00 0 \n",
"18 bmw NaN 2016-03-26 00:00:00 0 \n",
"19 mazda nein 2016-03-17 00:00:00 0 \n",
"20 audi NaN 2016-03-05 00:00:00 0 \n",
"21 porsche nein 2016-03-06 00:00:00 0 \n",
"22 mini nein 2016-03-28 00:00:00 0 \n",
"23 peugeot nein 2016-03-10 00:00:00 0 \n",
"24 bmw nein 2016-04-03 00:00:00 0 \n",
"25 ford ja 2016-03-21 00:00:00 0 \n",
"26 volkswagen nein 2016-04-03 00:00:00 0 \n",
"27 ford NaN 2016-03-27 00:00:00 0 \n",
"28 mini ja 2016-03-19 00:00:00 0 \n",
"29 mercedes_benz nein 2016-04-02 00:00:00 0 \n",
"... ... ... ... ... \n",
"49970 citroen nein 2016-03-21 00:00:00 0 \n",
"49971 volkswagen nein 2016-03-29 00:00:00 0 \n",
"49972 mercedes_benz ja 2016-03-26 00:00:00 0 \n",
"49973 mercedes_benz nein 2016-03-27 00:00:00 0 \n",
"49974 volkswagen nein 2016-03-20 00:00:00 0 \n",
"49975 honda nein 2016-03-27 00:00:00 0 \n",
"49976 audi nein 2016-03-19 00:00:00 0 \n",
"49977 mercedes_benz nein 2016-03-31 00:00:00 0 \n",
"49978 mercedes_benz ja 2016-04-04 00:00:00 0 \n",
"49979 volkswagen nein 2016-03-20 00:00:00 0 \n",
"49980 ford NaN 2016-03-12 00:00:00 0 \n",
"49981 opel nein 2016-03-15 00:00:00 0 \n",
"49982 skoda NaN 2016-03-29 00:00:00 0 \n",
"49983 ford NaN 2016-03-06 00:00:00 0 \n",
"49984 sonstige_autos NaN 2016-03-31 00:00:00 0 \n",
"49985 volkswagen NaN 2016-04-02 00:00:00 0 \n",
"49986 chrysler nein 2016-04-04 00:00:00 0 \n",
"49987 audi nein 2016-03-22 00:00:00 0 \n",
"49988 bmw nein 2016-03-28 00:00:00 0 \n",
"49989 volkswagen ja 2016-03-11 00:00:00 0 \n",
"49990 mercedes_benz nein 2016-03-21 00:00:00 0 \n",
"49991 renault NaN 2016-03-06 00:00:00 0 \n",
"49992 fiat nein 2016-03-10 00:00:00 0 \n",
"49993 audi NaN 2016-03-15 00:00:00 0 \n",
"49994 audi nein 2016-03-22 00:00:00 0 \n",
"49995 audi nein 2016-03-27 00:00:00 0 \n",
"49996 opel nein 2016-03-28 00:00:00 0 \n",
"49997 fiat nein 2016-04-02 00:00:00 0 \n",
"49998 audi nein 2016-03-08 00:00:00 0 \n",
"49999 opel nein 2016-03-13 00:00:00 0 \n",
"\n",
" postalCode lastSeen \n",
"0 79588 2016-04-06 06:45:54 \n",
"1 71034 2016-04-06 14:45:08 \n",
"2 35394 2016-04-06 20:15:37 \n",
"3 33729 2016-03-15 03:16:28 \n",
"4 39218 2016-04-01 14:38:50 \n",
"5 22962 2016-04-06 09:45:21 \n",
"6 31535 2016-03-23 02:48:59 \n",
"7 53474 2016-04-07 03:17:32 \n",
"8 7426 2016-03-26 18:18:10 \n",
"9 15749 2016-04-06 10:46:35 \n",
"10 86157 2016-04-07 03:16:21 \n",
"11 17498 2016-03-16 18:45:34 \n",
"12 34590 2016-04-06 14:17:52 \n",
"13 12043 2016-04-01 14:17:13 \n",
"14 81737 2016-04-01 15:46:47 \n",
"15 61276 2016-04-02 21:10:48 \n",
"16 57299 2016-03-18 05:29:37 \n",
"17 74821 2016-04-05 20:46:26 \n",
"18 54329 2016-04-02 12:16:41 \n",
"19 40878 2016-03-17 14:45:58 \n",
"20 53913 2016-03-07 05:46:46 \n",
"21 65428 2016-04-05 23:46:19 \n",
"22 44789 2016-04-01 06:45:30 \n",
"23 30900 2016-03-17 08:45:17 \n",
"24 22547 2016-04-07 13:16:50 \n",
"25 27574 2016-04-01 05:16:49 \n",
"26 38110 2016-04-05 23:46:48 \n",
"27 66701 2016-03-27 18:45:01 \n",
"28 15745 2016-04-07 14:58:48 \n",
"29 47638 2016-04-02 12:45:44 \n",
"... ... ... \n",
"49970 14947 2016-04-07 04:17:34 \n",
"49971 65197 2016-03-29 20:41:51 \n",
"49972 65326 2016-03-28 11:28:18 \n",
"49973 53567 2016-03-27 08:25:24 \n",
"49974 8209 2016-03-27 19:48:16 \n",
"49975 84385 2016-04-05 19:45:34 \n",
"49976 36100 2016-04-07 06:16:44 \n",
"49977 33739 2016-04-06 12:16:11 \n",
"49978 24405 2016-04-06 12:44:20 \n",
"49979 48455 2016-04-07 01:45:12 \n",
"49980 56218 2016-04-06 17:16:49 \n",
"49981 86859 2016-04-05 17:21:46 \n",
"49982 45884 2016-03-29 18:51:08 \n",
"49983 52477 2016-03-09 06:16:08 \n",
"49984 12103 2016-04-02 19:44:53 \n",
"49985 30900 2016-04-06 15:17:52 \n",
"49986 73527 2016-04-06 23:16:00 \n",
"49987 94362 2016-03-26 22:46:06 \n",
"49988 83646 2016-04-07 02:17:40 \n",
"49989 21244 2016-03-12 10:17:55 \n",
"49990 58239 2016-04-06 22:46:57 \n",
"49991 61350 2016-03-06 18:24:19 \n",
"49992 68642 2016-03-13 01:44:51 \n",
"49993 65203 2016-04-06 19:46:53 \n",
"49994 46537 2016-04-06 08:16:39 \n",
"49995 82131 2016-04-01 13:47:40 \n",
"49996 44807 2016-04-02 14:18:02 \n",
"49997 73430 2016-04-04 11:47:27 \n",
"49998 35683 2016-04-05 16:45:07 \n",
"49999 45897 2016-04-06 21:18:48 \n",
"\n",
"[50000 rows x 20 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 50000 entries, 0 to 49999\n",
"Data columns (total 20 columns):\n",
"dateCrawled 50000 non-null object\n",
"name 50000 non-null object\n",
"seller 50000 non-null object\n",
"offerType 50000 non-null object\n",
"price 50000 non-null object\n",
"abtest 50000 non-null object\n",
"vehicleType 44905 non-null object\n",
"yearOfRegistration 50000 non-null int64\n",
"gearbox 47320 non-null object\n",
"powerPS 50000 non-null int64\n",
"model 47242 non-null object\n",
"odometer 50000 non-null object\n",
"monthOfRegistration 50000 non-null int64\n",
"fuelType 45518 non-null object\n",
"brand 50000 non-null object\n",
"notRepairedDamage 40171 non-null object\n",
"dateCreated 50000 non-null object\n",
"nrOfPictures 50000 non-null int64\n",
"postalCode 50000 non-null int64\n",
"lastSeen 50000 non-null object\n",
"dtypes: int64(5), object(15)\n",
"memory usage: 7.6+ MB\n"
]
}],
"source": [
"# gather information from autos\n",
"autos.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First quick observation from the info() method is the missing values in the vehicleType, gearbox, model, fuelType and notRepairedDamage columns. We'll have to investigate a little further to see what exactly is the cause of these missing values. \n",
"\n",
"Next observation is it looks like all the columns of either the object (i.e. string) or integer type. However, there are a few columns that are one type but should be another type (ex. the price column, which is an object type but should be numeric). \n",
"\n",
"In summary observations:\n",
"- The dataset contains 20 columns, most of which are strings\n",
"- Some columns have null values, but none have more than ~20% null values\n",
"- column names use camelcase instead of Python's preferred snakecase, which means we can't just replace spaces with underscores"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>dateCrawled</th>\n",
" <th>name</th>\n",
" <th>seller</th>\n",
" <th>offerType</th>\n",
" <th>price</th>\n",
" <th>abtest</th>\n",
" <th>vehicleType</th>\n",
" <th>yearOfRegistration</th>\n",
" <th>gearbox</th>\n",
" <th>powerPS</th>\n",
" <th>model</th>\n",
" <th>odometer</th>\n",
" <th>monthOfRegistration</th>\n",
" <th>fuelType</th>\n",
" <th>brand</th>\n",
" <th>notRepairedDamage</th>\n",
" <th>dateCreated</th>\n",
" <th>nrOfPictures</th>\n",
" <th>postalCode</th>\n",
" <th>lastSeen</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2016-03-26 17:47:46</td>\n",
" <td>Peugeot_807_160_NAVTECH_ON_BOARD</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$5,000</td>\n",
" <td>control</td>\n",
" <td>bus</td>\n",
" <td>2004</td>\n",
" <td>manuell</td>\n",
" <td>158</td>\n",
" <td>andere</td>\n",
" <td>150,000km</td>\n",
" <td>3</td>\n",
" <td>lpg</td>\n",
" <td>peugeot</td>\n",
" <td>nein</td>\n",
" <td>2016-03-26 00:00:00</td>\n",
" <td>0</td>\n",
" <td>79588</td>\n",
" <td>2016-04-06 06:45:54</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2016-04-04 13:38:56</td>\n",
" <td>BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$8,500</td>\n",
" <td>control</td>\n",
" <td>limousine</td>\n",
" <td>1997</td>\n",
" <td>automatik</td>\n",
" <td>286</td>\n",
" <td>7er</td>\n",
" <td>150,000km</td>\n",
" <td>6</td>\n",
" <td>benzin</td>\n",
" <td>bmw</td>\n",
" <td>nein</td>\n",
" <td>2016-04-04 00:00:00</td>\n",
" <td>0</td>\n",
" <td>71034</td>\n",
" <td>2016-04-06 14:45:08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2016-03-26 18:57:24</td>\n",
" <td>Volkswagen_Golf_1.6_United</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$8,990</td>\n",
" <td>test</td>\n",
" <td>limousine</td>\n",
" <td>2009</td>\n",
" <td>manuell</td>\n",
" <td>102</td>\n",
" <td>golf</td>\n",
" <td>70,000km</td>\n",
" <td>7</td>\n",
" <td>benzin</td>\n",
" <td>volkswagen</td>\n",
" <td>nein</td>\n",
" <td>2016-03-26 00:00:00</td>\n",
" <td>0</td>\n",
" <td>35394</td>\n",
" <td>2016-04-06 20:15:37</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2016-03-12 16:58:10</td>\n",
" <td>Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$4,350</td>\n",
" <td>control</td>\n",
" <td>kleinwagen</td>\n",
" <td>2007</td>\n",
" <td>automatik</td>\n",
" <td>71</td>\n",
" <td>fortwo</td>\n",
" <td>70,000km</td>\n",
" <td>6</td>\n",
" <td>benzin</td>\n",
" <td>smart</td>\n",
" <td>nein</td>\n",
" <td>2016-03-12 00:00:00</td>\n",
" <td>0</td>\n",
" <td>33729</td>\n",
" <td>2016-03-15 03:16:28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2016-04-01 14:38:50</td>\n",
" <td>Ford_Focus_1_6_Benzin_T√úV_neu_ist_sehr_gepfleg...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$1,350</td>\n",
" <td>test</td>\n",
" <td>kombi</td>\n",
" <td>2003</td>\n",
" <td>manuell</td>\n",
" <td>0</td>\n",
" <td>focus</td>\n",
" <td>150,000km</td>\n",
" <td>7</td>\n",
" <td>benzin</td>\n",
" <td>ford</td>\n",
" <td>nein</td>\n",
" <td>2016-04-01 00:00:00</td>\n",
" <td>0</td>\n",
" <td>39218</td>\n",
" <td>2016-04-01 14:38:50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" dateCrawled name \\\n",
"0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD \n",
"1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik \n",
"2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United \n",
"3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... \n",
"4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_T√úV_neu_ist_sehr_gepfleg... \n",
"\n",
" seller offerType price abtest vehicleType yearOfRegistration \\\n",
"0 privat Angebot $5,000 control bus 2004 \n",
"1 privat Angebot $8,500 control limousine 1997 \n",
"2 privat Angebot $8,990 test limousine 2009 \n",
"3 privat Angebot $4,350 control kleinwagen 2007 \n",
"4 privat Angebot $1,350 test kombi 2003 \n",
"\n",
" gearbox powerPS model odometer monthOfRegistration fuelType \\\n",
"0 manuell 158 andere 150,000km 3 lpg \n",
"1 automatik 286 7er 150,000km 6 benzin \n",
"2 manuell 102 golf 70,000km 7 benzin \n",
"3 automatik 71 fortwo 70,000km 6 benzin \n",
"4 manuell 0 focus 150,000km 7 benzin \n",
"\n",
" brand notRepairedDamage dateCreated nrOfPictures \\\n",
"0 peugeot nein 2016-03-26 00:00:00 0 \n",
"1 bmw nein 2016-04-04 00:00:00 0 \n",
"2 volkswagen nein 2016-03-26 00:00:00 0 \n",
"3 smart nein 2016-03-12 00:00:00 0 \n",
"4 ford nein 2016-04-01 00:00:00 0 \n",
"\n",
" postalCode lastSeen \n",
"0 79588 2016-04-06 06:45:54 \n",
"1 71034 2016-04-06 14:45:08 \n",
"2 35394 2016-04-06 20:15:37 \n",
"3 33729 2016-03-15 03:16:28 \n",
"4 39218 2016-04-01 14:38:50 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Upon further inspection using the head() method, we can also see that some of the text is in German and this will have to be addressed in the clean-up. \n",
"\n",
"To begin, we'll convert the column names from camelcase to snakecase and reword some of the column names based on the data dictionary to be more descriptive. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cleaning Column Names"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',\n",
" 'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',\n",
" 'odometer', 'monthOfRegistration', 'fuelType', 'brand',\n",
" 'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',\n",
" 'lastSeen'],\n",
" dtype='object')"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"# use columns attribute to print an array of the existing column names\n",
"autos.columns"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>date_crawled</th>\n",
" <th>name</th>\n",
" <th>seller</th>\n",
" <th>offer_type</th>\n",
" <th>price</th>\n",
" <th>ab_test</th>\n",
" <th>vehicle_type</th>\n",
" <th>registration_year</th>\n",
" <th>gearbox</th>\n",
" <th>power_ps</th>\n",
" <th>model</th>\n",
" <th>odometer</th>\n",
" <th>registration_month</th>\n",
" <th>fuel_type</th>\n",
" <th>brand</th>\n",
" <th>unrepaired_damage</th>\n",
" <th>ad_created</th>\n",
" <th>num_pictures</th>\n",
" <th>postal_code</th>\n",
" <th>last_seen</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2016-03-26 17:47:46</td>\n",
" <td>Peugeot_807_160_NAVTECH_ON_BOARD</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$5,000</td>\n",
" <td>control</td>\n",
" <td>bus</td>\n",
" <td>2004</td>\n",
" <td>manuell</td>\n",
" <td>158</td>\n",
" <td>andere</td>\n",
" <td>150,000km</td>\n",
" <td>3</td>\n",
" <td>lpg</td>\n",
" <td>peugeot</td>\n",
" <td>nein</td>\n",
" <td>2016-03-26 00:00:00</td>\n",
" <td>0</td>\n",
" <td>79588</td>\n",
" <td>2016-04-06 06:45:54</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2016-04-04 13:38:56</td>\n",
" <td>BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$8,500</td>\n",
" <td>control</td>\n",
" <td>limousine</td>\n",
" <td>1997</td>\n",
" <td>automatik</td>\n",
" <td>286</td>\n",
" <td>7er</td>\n",
" <td>150,000km</td>\n",
" <td>6</td>\n",
" <td>benzin</td>\n",
" <td>bmw</td>\n",
" <td>nein</td>\n",
" <td>2016-04-04 00:00:00</td>\n",
" <td>0</td>\n",
" <td>71034</td>\n",
" <td>2016-04-06 14:45:08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2016-03-26 18:57:24</td>\n",
" <td>Volkswagen_Golf_1.6_United</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$8,990</td>\n",
" <td>test</td>\n",
" <td>limousine</td>\n",
" <td>2009</td>\n",
" <td>manuell</td>\n",
" <td>102</td>\n",
" <td>golf</td>\n",
" <td>70,000km</td>\n",
" <td>7</td>\n",
" <td>benzin</td>\n",
" <td>volkswagen</td>\n",
" <td>nein</td>\n",
" <td>2016-03-26 00:00:00</td>\n",
" <td>0</td>\n",
" <td>35394</td>\n",
" <td>2016-04-06 20:15:37</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2016-03-12 16:58:10</td>\n",
" <td>Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$4,350</td>\n",
" <td>control</td>\n",
" <td>kleinwagen</td>\n",
" <td>2007</td>\n",
" <td>automatik</td>\n",
" <td>71</td>\n",
" <td>fortwo</td>\n",
" <td>70,000km</td>\n",
" <td>6</td>\n",
" <td>benzin</td>\n",
" <td>smart</td>\n",
" <td>nein</td>\n",
" <td>2016-03-12 00:00:00</td>\n",
" <td>0</td>\n",
" <td>33729</td>\n",
" <td>2016-03-15 03:16:28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2016-04-01 14:38:50</td>\n",
" <td>Ford_Focus_1_6_Benzin_T√úV_neu_ist_sehr_gepfleg...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$1,350</td>\n",
" <td>test</td>\n",
" <td>kombi</td>\n",
" <td>2003</td>\n",
" <td>manuell</td>\n",
" <td>0</td>\n",
" <td>focus</td>\n",
" <td>150,000km</td>\n",
" <td>7</td>\n",
" <td>benzin</td>\n",
" <td>ford</td>\n",
" <td>nein</td>\n",
" <td>2016-04-01 00:00:00</td>\n",
" <td>0</td>\n",
" <td>39218</td>\n",
" <td>2016-04-01 14:38:50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" date_crawled name \\\n",
"0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD \n",
"1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik \n",
"2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United \n",
"3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... \n",
"4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_T√úV_neu_ist_sehr_gepfleg... \n",
"\n",
" seller offer_type price ab_test vehicle_type registration_year \\\n",
"0 privat Angebot $5,000 control bus 2004 \n",
"1 privat Angebot $8,500 control limousine 1997 \n",
"2 privat Angebot $8,990 test limousine 2009 \n",
"3 privat Angebot $4,350 control kleinwagen 2007 \n",
"4 privat Angebot $1,350 test kombi 2003 \n",
"\n",
" gearbox power_ps model odometer registration_month fuel_type \\\n",
"0 manuell 158 andere 150,000km 3 lpg \n",
"1 automatik 286 7er 150,000km 6 benzin \n",
"2 manuell 102 golf 70,000km 7 benzin \n",
"3 automatik 71 fortwo 70,000km 6 benzin \n",
"4 manuell 0 focus 150,000km 7 benzin \n",
"\n",
" brand unrepaired_damage ad_created num_pictures \\\n",
"0 peugeot nein 2016-03-26 00:00:00 0 \n",
"1 bmw nein 2016-04-04 00:00:00 0 \n",
"2 volkswagen nein 2016-03-26 00:00:00 0 \n",
"3 smart nein 2016-03-12 00:00:00 0 \n",
"4 ford nein 2016-04-01 00:00:00 0 \n",
"\n",
" postal_code last_seen \n",
"0 79588 2016-04-06 06:45:54 \n",
"1 71034 2016-04-06 14:45:08 \n",
"2 35394 2016-04-06 20:15:37 \n",
"3 33729 2016-03-15 03:16:28 \n",
"4 39218 2016-04-01 14:38:50 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"# list of new names to update columns\n",
"cols = ['date_crawled', 'name', 'seller', 'offer_type', 'price', 'ab_test',\n",
" 'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',\n",
" 'odometer', 'registration_month', 'fuel_type', 'brand',\n",
" 'unrepaired_damage', 'ad_created', 'num_pictures', 'postal_code',\n",
" 'last_seen']\n",
"\n",
"autos.columns = cols\n",
"autos_test.columns = cols\n",
"autos.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Above, you can see that we have changed a few of the column names. The following is a list of the original column name and it's replacement (FYI not all column names were changed, just the ones listed below):\n",
"\n",
"- dateCrawled = date_crawled\n",
"- offerType = offer_type\n",
"- abtest = ab_test\n",
"- vehicleType = vehicle_type\n",
"- yearOfRegistration = registration_year\n",
"- powerPS = power_ps\n",
"- monthOfRegistration = registration_month\n",
"- fuelType = fuel_type\n",
"- notRepairedDamage = unrepaired_damage\n",
"- dateCreated = ad_created\n",
"- nrOfPictures = num_pictures\n",
"- postalCode = postal_code\n",
"- lastSeen = last_seen\n",
"\n",
"With these changes, we've updated the column names from camelcase (i.e. no space and uppercase) to snakecase (i.e. lowercase with an underscore)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initial Exploration and Cleaning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we're going to do some data exploration to determine what other cleaning tasks need to be done. We'll start by looking for: text columns where all or almost all values are the same (these can often be dropped as they don't have useful information for analysis); and examples of numeric data stored as text which can be cleaned and converted."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>date_crawled</th>\n",
" <th>name</th>\n",
" <th>seller</th>\n",
" <th>offer_type</th>\n",
" <th>price</th>\n",
" <th>ab_test</th>\n",
" <th>vehicle_type</th>\n",
" <th>registration_year</th>\n",
" <th>gearbox</th>\n",
" <th>power_ps</th>\n",
" <th>model</th>\n",
" <th>odometer</th>\n",
" <th>registration_month</th>\n",
" <th>fuel_type</th>\n",
" <th>brand</th>\n",
" <th>unrepaired_damage</th>\n",
" <th>ad_created</th>\n",
" <th>num_pictures</th>\n",
" <th>postal_code</th>\n",
" <th>last_seen</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>50000</td>\n",
" <td>50000</td>\n",
" <td>50000</td>\n",
" <td>50000</td>\n",
" <td>50000</td>\n",
" <td>50000</td>\n",
" <td>44905</td>\n",
" <td>50000.000000</td>\n",
" <td>47320</td>\n",
" <td>50000.000000</td>\n",
" <td>47242</td>\n",
" <td>50000</td>\n",
" <td>50000.000000</td>\n",
" <td>45518</td>\n",
" <td>50000</td>\n",
" <td>40171</td>\n",
" <td>50000</td>\n",
" <td>50000.0</td>\n",
" <td>50000.000000</td>\n",
" <td>50000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>48213</td>\n",
" <td>38754</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2357</td>\n",
" <td>2</td>\n",
" <td>8</td>\n",
" <td>NaN</td>\n",
" <td>2</td>\n",
" <td>NaN</td>\n",
" <td>245</td>\n",
" <td>13</td>\n",
" <td>NaN</td>\n",
" <td>7</td>\n",
" <td>40</td>\n",
" <td>2</td>\n",
" <td>76</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>39481</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>2016-03-11 22:38:16</td>\n",
" <td>Ford_Fiesta</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>$0</td>\n",
" <td>test</td>\n",
" <td>limousine</td>\n",
" <td>NaN</td>\n",
" <td>manuell</td>\n",
" <td>NaN</td>\n",
" <td>golf</td>\n",
" <td>150,000km</td>\n",
" <td>NaN</td>\n",
" <td>benzin</td>\n",
" <td>volkswagen</td>\n",
" <td>nein</td>\n",
" <td>2016-04-03 00:00:00</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2016-04-07 06:17:27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>3</td>\n",
" <td>78</td>\n",
" <td>49999</td>\n",
" <td>49999</td>\n",
" <td>1421</td>\n",
" <td>25756</td>\n",
" <td>12859</td>\n",
" <td>NaN</td>\n",
" <td>36993</td>\n",
" <td>NaN</td>\n",
" <td>4024</td>\n",
" <td>32424</td>\n",
" <td>NaN</td>\n",
" <td>30107</td>\n",
" <td>10687</td>\n",
" <td>35232</td>\n",
" <td>1946</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2005.073280</td>\n",
" <td>NaN</td>\n",
" <td>116.355920</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>5.723360</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>50813.627300</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>105.712813</td>\n",
" <td>NaN</td>\n",
" <td>209.216627</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>3.711984</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>25779.747957</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1000.000000</td>\n",
" <td>NaN</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>1067.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1999.000000</td>\n",
" <td>NaN</td>\n",
" <td>70.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>3.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>30451.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2003.000000</td>\n",
" <td>NaN</td>\n",
" <td>105.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>6.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>49577.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2008.000000</td>\n",
" <td>NaN</td>\n",
" <td>150.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>9.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>71540.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>9999.000000</td>\n",
" <td>NaN</td>\n",
" <td>17700.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>12.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>99998.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" date_crawled name seller offer_type price ab_test \\\n",
"count 50000 50000 50000 50000 50000 50000 \n",
"unique 48213 38754 2 2 2357 2 \n",
"top 2016-03-11 22:38:16 Ford_Fiesta privat Angebot $0 test \n",
"freq 3 78 49999 49999 1421 25756 \n",
"mean NaN NaN NaN NaN NaN NaN \n",
"std NaN NaN NaN NaN NaN NaN \n",
"min NaN NaN NaN NaN NaN NaN \n",
"25% NaN NaN NaN NaN NaN NaN \n",
"50% NaN NaN NaN NaN NaN NaN \n",
"75% NaN NaN NaN NaN NaN NaN \n",
"max NaN NaN NaN NaN NaN NaN \n",
"\n",
" vehicle_type registration_year gearbox power_ps model \\\n",
"count 44905 50000.000000 47320 50000.000000 47242 \n",
"unique 8 NaN 2 NaN 245 \n",
"top limousine NaN manuell NaN golf \n",
"freq 12859 NaN 36993 NaN 4024 \n",
"mean NaN 2005.073280 NaN 116.355920 NaN \n",
"std NaN 105.712813 NaN 209.216627 NaN \n",
"min NaN 1000.000000 NaN 0.000000 NaN \n",
"25% NaN 1999.000000 NaN 70.000000 NaN \n",
"50% NaN 2003.000000 NaN 105.000000 NaN \n",
"75% NaN 2008.000000 NaN 150.000000 NaN \n",
"max NaN 9999.000000 NaN 17700.000000 NaN \n",
"\n",
" odometer registration_month fuel_type brand unrepaired_damage \\\n",
"count 50000 50000.000000 45518 50000 40171 \n",
"unique 13 NaN 7 40 2 \n",
"top 150,000km NaN benzin volkswagen nein \n",
"freq 32424 NaN 30107 10687 35232 \n",
"mean NaN 5.723360 NaN NaN NaN \n",
"std NaN 3.711984 NaN NaN NaN \n",
"min NaN 0.000000 NaN NaN NaN \n",
"25% NaN 3.000000 NaN NaN NaN \n",
"50% NaN 6.000000 NaN NaN NaN \n",
"75% NaN 9.000000 NaN NaN NaN \n",
"max NaN 12.000000 NaN NaN NaN \n",
"\n",
" ad_created num_pictures postal_code last_seen \n",
"count 50000 50000.0 50000.000000 50000 \n",
"unique 76 NaN NaN 39481 \n",
"top 2016-04-03 00:00:00 NaN NaN 2016-04-07 06:17:27 \n",
"freq 1946 NaN NaN 8 \n",
"mean NaN 0.0 50813.627300 NaN \n",
"std NaN 0.0 25779.747957 NaN \n",
"min NaN 0.0 1067.000000 NaN \n",
"25% NaN 0.0 30451.000000 NaN \n",
"50% NaN 0.0 49577.000000 NaN \n",
"75% NaN 0.0 71540.000000 NaN \n",
"max NaN 0.0 99998.000000 NaN "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"# use describe() to look at descriptive stats for all columns (including categorical and numeric columns)\n",
"autos.describe(include='all')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- seller column has 49999 'privat' (i.e. private) entries and 1 'gewerblich' (i.e. commercial) entry\n",
"- offer_type has 49999 'Angebot' (i.e. offer) entries and 1 'Gesuch' (i.e. petition) entry\n",
"- price column needs to be converted from text to numeric data\n",
"- registration_year looks off; looks to be numeric type with values that range from a miniumum of 1000.0 to 9999.0\n",
" - As far as I know, cars weren't around a thousand years ago...and we're not 7,000 years in the future\n",
"- there looks to be some potentially interesting entries in the power_ps column (i.e. max = 17,700)\n",
"- the odometer column looks to be another one that needs to be converted from text to numeric data\n",
"- registration_month looks to be a numeric type; don't know is this is the appropriate type (nothing necessarily wrong with it but maybe it should be categorical? TBD...)\n",
"- num_pictures doesn't look to have any values of interest at all...\n",
"- postal_code looks to be a float type, when a int type would possibly be more appropriate"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"dtype('float64')"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"# for price and odometer columns remove any non-numeric characters, convert the column to numeric dtype and rename\n",
"# to odometer_km\n",
"autos['price'] = (autos['price']\n",
" .str.strip()\n",
" .str.replace('$','')\n",
" .str.replace(',','')\n",
" .astype(float)\n",
" )\n",
"\n",
"autos['price'].dtype"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"0 5000.0\n",
"1 8500.0\n",
"2 8990.0\n",
"3 4350.0\n",
"4 1350.0\n",
"Name: price, dtype: float64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos['price'].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"dtype('float64')"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos['odometer'] = (autos['odometer']\n",
" .str.strip()\n",
" .str.replace('km','')\n",
" .str.replace(',','')\n",
" .astype(float)\n",
" )\n",
"\n",
"autos['odometer'].dtype"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"0 150000.0\n",
"1 150000.0\n",
"2 70000.0\n",
"3 70000.0\n",
"4 150000.0\n",
"Name: odometer, dtype: float64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos['odometer'].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"autos.rename({'odometer': 'odometer_km'}, axis=1, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"Index(['date_crawled', 'name', 'seller', 'offer_type', 'price', 'ab_test',\n",
" 'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',\n",
" 'odometer_km', 'registration_month', 'fuel_type', 'brand',\n",
" 'unrepaired_damage', 'ad_created', 'num_pictures', 'postal_code',\n",
" 'last_seen'],\n",
" dtype='object')"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploring the Odometer and Price Columns"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"(13,)"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"# how many unique values in odometer_km column\n",
"autos['odometer_km'].unique().shape"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"count 50000.000000\n",
"mean 125732.700000\n",
"std 40042.211706\n",
"min 5000.000000\n",
"25% 125000.000000\n",
"50% 150000.000000\n",
"75% 150000.000000\n",
"max 150000.000000\n",
"Name: odometer_km, dtype: float64"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"# view min/max/median/mean/etc. of odometer_km column\n",
"autos['odometer_km'].describe()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"150000.0 32424\n",
"125000.0 5170\n",
"100000.0 2169\n",
"90000.0 1757\n",
"80000.0 1436\n",
"70000.0 1230\n",
"60000.0 1164\n",
"50000.0 1027\n",
"5000.0 967\n",
"40000.0 819\n",
"30000.0 789\n",
"20000.0 784\n",
"10000.0 264\n",
"Name: odometer_km, dtype: int64"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"# use value_counts to see frequency of values in odometer_km column\n",
"autos['odometer_km'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"(2357,)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"# how many unique values in price column\n",
"autos['price'].unique().shape"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"count 5.000000e+04\n",
"mean 9.840044e+03\n",
"std 4.811044e+05\n",
"min 0.000000e+00\n",
"25% 1.100000e+03\n",
"50% 2.950000e+03\n",
"75% 7.200000e+03\n",
"max 1.000000e+08\n",
"Name: price, dtype: float64"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"# view min/max/median/mean/etc. of price column\n",
"autos['price'].describe()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"99999999.0 1\n",
"27322222.0 1\n",
"12345678.0 3\n",
"11111111.0 2\n",
"10000000.0 1\n",
"3890000.0 1\n",
"1300000.0 1\n",
"1234566.0 1\n",
"999999.0 2\n",
"999990.0 1\n",
"350000.0 1\n",
"345000.0 1\n",
"299000.0 1\n",
"295000.0 1\n",
"265000.0 1\n",
"Name: price, dtype: int64"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"# use value_counts to see frequency of values in price column\n",
"autos['price'].value_counts().sort_index(ascending=False).head(15)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>date_crawled</th>\n",
" <th>name</th>\n",
" <th>seller</th>\n",
" <th>offer_type</th>\n",
" <th>price</th>\n",
" <th>ab_test</th>\n",
" <th>vehicle_type</th>\n",
" <th>registration_year</th>\n",
" <th>gearbox</th>\n",
" <th>power_ps</th>\n",
" <th>model</th>\n",
" <th>odometer_km</th>\n",
" <th>registration_month</th>\n",
" <th>fuel_type</th>\n",
" <th>brand</th>\n",
" <th>unrepaired_damage</th>\n",
" <th>ad_created</th>\n",
" <th>num_pictures</th>\n",
" <th>postal_code</th>\n",
" <th>last_seen</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>514</th>\n",
" <td>2016-03-17 09:53:08</td>\n",
" <td>Ford_Focus_Turnier_1.6_16V_Style</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>999999.0</td>\n",
" <td>test</td>\n",
" <td>kombi</td>\n",
" <td>2009</td>\n",
" <td>manuell</td>\n",
" <td>101</td>\n",
" <td>focus</td>\n",
" <td>125000.0</td>\n",
" <td>4</td>\n",
" <td>benzin</td>\n",
" <td>ford</td>\n",
" <td>nein</td>\n",
" <td>2016-03-17 00:00:00</td>\n",
" <td>0</td>\n",
" <td>12205</td>\n",
" <td>2016-04-06 07:17:35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1878</th>\n",
" <td>2016-03-12 16:58:37</td>\n",
" <td>Porsche_911_Turbo</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>129000.0</td>\n",
" <td>control</td>\n",
" <td>coupe</td>\n",
" <td>1995</td>\n",
" <td>manuell</td>\n",
" <td>408</td>\n",
" <td>911</td>\n",
" <td>125000.0</td>\n",
" <td>9</td>\n",
" <td>benzin</td>\n",
" <td>porsche</td>\n",
" <td>nein</td>\n",
" <td>2016-03-12 00:00:00</td>\n",
" <td>0</td>\n",
" <td>70180</td>\n",
" <td>2016-04-05 04:49:19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2454</th>\n",
" <td>2016-03-21 22:51:29</td>\n",
" <td>Porsche_911_GT3</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>137999.0</td>\n",
" <td>control</td>\n",
" <td>coupe</td>\n",
" <td>2010</td>\n",
" <td>manuell</td>\n",
" <td>435</td>\n",
" <td>911</td>\n",
" <td>20000.0</td>\n",
" <td>7</td>\n",
" <td>benzin</td>\n",
" <td>porsche</td>\n",
" <td>nein</td>\n",
" <td>2016-03-21 00:00:00</td>\n",
" <td>0</td>\n",
" <td>80636</td>\n",
" <td>2016-04-07 05:45:39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2751</th>\n",
" <td>2016-03-15 10:52:35</td>\n",
" <td>Porsche_911___993_4S</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>120000.0</td>\n",
" <td>control</td>\n",
" <td>coupe</td>\n",
" <td>1998</td>\n",
" <td>manuell</td>\n",
" <td>286</td>\n",
" <td>911</td>\n",
" <td>125000.0</td>\n",
" <td>3</td>\n",
" <td>benzin</td>\n",
" <td>porsche</td>\n",
" <td>nein</td>\n",
" <td>2016-03-15 00:00:00</td>\n",
" <td>0</td>\n",
" <td>25488</td>\n",
" <td>2016-04-05 19:47:31</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2897</th>\n",
" <td>2016-03-12 21:50:57</td>\n",
" <td>Escort_MK_1_Hundeknochen_zum_umbauen_auf_RS_2000</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>11111111.0</td>\n",
" <td>test</td>\n",
" <td>limousine</td>\n",
" <td>1973</td>\n",
" <td>manuell</td>\n",
" <td>48</td>\n",
" <td>escort</td>\n",
" <td>50000.0</td>\n",
" <td>3</td>\n",
" <td>benzin</td>\n",
" <td>ford</td>\n",
" <td>nein</td>\n",
" <td>2016-03-12 00:00:00</td>\n",
" <td>0</td>\n",
" <td>94469</td>\n",
" <td>2016-03-12 22:45:27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7402</th>\n",
" <td>2016-03-22 19:48:09</td>\n",
" <td>Porsche_911_Carrera_4S_Cabrio_PDK__BOSE__NEU__...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>115000.0</td>\n",
" <td>test</td>\n",
" <td>cabrio</td>\n",
" <td>2016</td>\n",
" <td>automatik</td>\n",
" <td>400</td>\n",
" <td>911</td>\n",
" <td>5000.0</td>\n",
" <td>3</td>\n",
" <td>benzin</td>\n",
" <td>porsche</td>\n",
" <td>nein</td>\n",
" <td>2016-03-22 00:00:00</td>\n",
" <td>0</td>\n",
" <td>51379</td>\n",
" <td>2016-03-26 21:46:46</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7814</th>\n",
" <td>2016-04-04 11:53:31</td>\n",
" <td>Ferrari_F40</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>1300000.0</td>\n",
" <td>control</td>\n",
" <td>coupe</td>\n",
" <td>1992</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>50000.0</td>\n",
" <td>12</td>\n",
" <td>NaN</td>\n",
" <td>sonstige_autos</td>\n",
" <td>nein</td>\n",
" <td>2016-04-04 00:00:00</td>\n",
" <td>0</td>\n",
" <td>60598</td>\n",
" <td>2016-04-05 11:34:11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8232</th>\n",
" <td>2016-04-01 21:50:47</td>\n",
" <td>Porsche_993_S_Schalter_BRD_neuwertig</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>128000.0</td>\n",
" <td>test</td>\n",
" <td>coupe</td>\n",
" <td>1997</td>\n",
" <td>manuell</td>\n",
" <td>286</td>\n",
" <td>911</td>\n",
" <td>100000.0</td>\n",
" <td>4</td>\n",
" <td>benzin</td>\n",
" <td>porsche</td>\n",
" <td>nein</td>\n",
" <td>2016-04-01 00:00:00</td>\n",
" <td>0</td>\n",
" <td>81543</td>\n",
" <td>2016-04-05 19:46:23</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10500</th>\n",
" <td>2016-03-17 12:56:38</td>\n",
" <td>Porsche_991</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>155000.0</td>\n",
" <td>test</td>\n",
" <td>coupe</td>\n",
" <td>2013</td>\n",
" <td>NaN</td>\n",
" <td>476</td>\n",
" <td>911</td>\n",
" <td>20000.0</td>\n",
" <td>11</td>\n",
" <td>NaN</td>\n",
" <td>porsche</td>\n",
" <td>nein</td>\n",
" <td>2016-03-17 00:00:00</td>\n",
" <td>0</td>\n",
" <td>90768</td>\n",
" <td>2016-03-26 23:16:41</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11137</th>\n",
" <td>2016-03-29 23:52:57</td>\n",
" <td>suche_maserati_3200_gt_Zustand_unwichtig_laufe...</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>10000000.0</td>\n",
" <td>control</td>\n",
" <td>coupe</td>\n",
" <td>1960</td>\n",
" <td>manuell</td>\n",
" <td>368</td>\n",
" <td>NaN</td>\n",
" <td>100000.0</td>\n",
" <td>1</td>\n",
" <td>benzin</td>\n",
" <td>sonstige_autos</td>\n",
" <td>nein</td>\n",
" <td>2016-03-29 00:00:00</td>\n",
" <td>0</td>\n",
" <td>73033</td>\n",
" <td>2016-04-06 21:18:11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" date_crawled name \\\n",
"514 2016-03-17 09:53:08 Ford_Focus_Turnier_1.6_16V_Style \n",
"1878 2016-03-12 16:58:37 Porsche_911_Turbo \n",
"2454 2016-03-21 22:51:29 Porsche_911_GT3 \n",
"2751 2016-03-15 10:52:35 Porsche_911___993_4S \n",
"2897 2016-03-12 21:50:57 Escort_MK_1_Hundeknochen_zum_umbauen_auf_RS_2000 \n",
"7402 2016-03-22 19:48:09 Porsche_911_Carrera_4S_Cabrio_PDK__BOSE__NEU__... \n",
"7814 2016-04-04 11:53:31 Ferrari_F40 \n",
"8232 2016-04-01 21:50:47 Porsche_993_S_Schalter_BRD_neuwertig \n",
"10500 2016-03-17 12:56:38 Porsche_991 \n",
"11137 2016-03-29 23:52:57 suche_maserati_3200_gt_Zustand_unwichtig_laufe... \n",
"\n",
" seller offer_type price ab_test vehicle_type registration_year \\\n",
"514 privat Angebot 999999.0 test kombi 2009 \n",
"1878 privat Angebot 129000.0 control coupe 1995 \n",
"2454 privat Angebot 137999.0 control coupe 2010 \n",
"2751 privat Angebot 120000.0 control coupe 1998 \n",
"2897 privat Angebot 11111111.0 test limousine 1973 \n",
"7402 privat Angebot 115000.0 test cabrio 2016 \n",
"7814 privat Angebot 1300000.0 control coupe 1992 \n",
"8232 privat Angebot 128000.0 test coupe 1997 \n",
"10500 privat Angebot 155000.0 test coupe 2013 \n",
"11137 privat Angebot 10000000.0 control coupe 1960 \n",
"\n",
" gearbox power_ps model odometer_km registration_month fuel_type \\\n",
"514 manuell 101 focus 125000.0 4 benzin \n",
"1878 manuell 408 911 125000.0 9 benzin \n",
"2454 manuell 435 911 20000.0 7 benzin \n",
"2751 manuell 286 911 125000.0 3 benzin \n",
"2897 manuell 48 escort 50000.0 3 benzin \n",
"7402 automatik 400 911 5000.0 3 benzin \n",
"7814 NaN 0 NaN 50000.0 12 NaN \n",
"8232 manuell 286 911 100000.0 4 benzin \n",
"10500 NaN 476 911 20000.0 11 NaN \n",
"11137 manuell 368 NaN 100000.0 1 benzin \n",
"\n",
" brand unrepaired_damage ad_created num_pictures \\\n",
"514 ford nein 2016-03-17 00:00:00 0 \n",
"1878 porsche nein 2016-03-12 00:00:00 0 \n",
"2454 porsche nein 2016-03-21 00:00:00 0 \n",
"2751 porsche nein 2016-03-15 00:00:00 0 \n",
"2897 ford nein 2016-03-12 00:00:00 0 \n",
"7402 porsche nein 2016-03-22 00:00:00 0 \n",
"7814 sonstige_autos nein 2016-04-04 00:00:00 0 \n",
"8232 porsche nein 2016-04-01 00:00:00 0 \n",
"10500 porsche nein 2016-03-17 00:00:00 0 \n",
"11137 sonstige_autos nein 2016-03-29 00:00:00 0 \n",
"\n",
" postal_code last_seen \n",
"514 12205 2016-04-06 07:17:35 \n",
"1878 70180 2016-04-05 04:49:19 \n",
"2454 80636 2016-04-07 05:45:39 \n",
"2751 25488 2016-04-05 19:47:31 \n",
"2897 94469 2016-03-12 22:45:27 \n",
"7402 51379 2016-03-26 21:46:46 \n",
"7814 60598 2016-04-05 11:34:11 \n",
"8232 81543 2016-04-05 19:46:23 \n",
"10500 90768 2016-03-26 23:16:41 \n",
"11137 73033 2016-04-06 21:18:11 "
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos[autos['price'] > 100000].head(10)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"array([ 0. , 0. , 0. , 200. , 1100. , 2950. ,\n",
" 7200. , 26390.25, 47000. ])"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"np.percentile(autos['price'], [0.5, 1, 2.5, 5, 25, 50, 75, 97.5, 99.5])"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"count 48379.000000\n",
"mean 5572.591992\n",
"std 6715.983101\n",
"min 1.000000\n",
"25% 1200.000000\n",
"50% 3000.000000\n",
"75% 7350.000000\n",
"max 50000.000000\n",
"Name: price, dtype: float64"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos_new = autos[autos['price'].between(1, 50000)]\n",
"autos_new['price'].describe()"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"count 44911.000000\n",
"mean 5981.676137\n",
"std 6800.844761\n",
"min 500.000000\n",
"25% 1500.000000\n",
"50% 3499.000000\n",
"75% 7888.000000\n",
"max 50000.000000\n",
"Name: price, dtype: float64"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos_new1 = autos[autos['price'].between(500, 50000)]\n",
"autos_new1['price'].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"From further exploration, we are able to see that there are some pretty nice cars listed but some of their price tags seem to be a little...off (i.e. 99999999). In this higher price range there are some Porsche's and Ferrari's for example that seem to be in an appropriate price range (i.e. 100,000 < p < 500000) but their portion of the total population is significantly low. \n",
"\n",
"Additionally, there are a significant number with prices that seem to be a little low or that are $0. As we can see from the percentiles, 5 percent of the price observations are 200 or less. Additionally, 99.5 percent of the data is 47,000 or less. These two insights led me to create a minimum cutoff at 500 and a maximum of 50,000. \n",
"\n",
"The primary reason for having the minimum at 500 was due to the fact that eBay also has multiple fees pertaining to listing vehicles on their website. After a little research, I deduced that 500 would be an appropriate minimum amount if a seller wanted to have any margin on their listing. \n",
"\n",
"Overall, we reduced the observations by ~approx. 10 percent. "
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"array([ 5000., 5000., 20000., 30000., 100000., 150000., 150000.,\n",
" 150000., 150000.])"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"np.percentile(autos_new1['odometer_km'], [0.5, 1, 2.5, 5, 25, 50, 75, 97.5, 99.5])"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"In regards to the values for odometer_km, there doesn't appear to be any significant outliers, so we will not make any changes to this column"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>date_crawled</th>\n",
" <th>name</th>\n",
" <th>seller</th>\n",
" <th>offer_type</th>\n",
" <th>price</th>\n",
" <th>ab_test</th>\n",
" <th>vehicle_type</th>\n",
" <th>registration_year</th>\n",
" <th>gearbox</th>\n",
" <th>power_ps</th>\n",
" <th>model</th>\n",
" <th>odometer_km</th>\n",
" <th>registration_month</th>\n",
" <th>fuel_type</th>\n",
" <th>brand</th>\n",
" <th>unrepaired_damage</th>\n",
" <th>ad_created</th>\n",
" <th>num_pictures</th>\n",
" <th>postal_code</th>\n",
" <th>last_seen</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>44911</td>\n",
" <td>44911</td>\n",
" <td>44911</td>\n",
" <td>44911</td>\n",
" <td>44911.000000</td>\n",
" <td>44911</td>\n",
" <td>41239</td>\n",
" <td>44911.000000</td>\n",
" <td>43103</td>\n",
" <td>44911.000000</td>\n",
" <td>42861</td>\n",
" <td>44911.000000</td>\n",
" <td>44911.000000</td>\n",
" <td>41721</td>\n",
" <td>44911</td>\n",
" <td>37401</td>\n",
" <td>44911</td>\n",
" <td>44911.0</td>\n",
" <td>44911.000000</td>\n",
" <td>44911</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>43486</td>\n",
" <td>34415</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>2</td>\n",
" <td>8</td>\n",
" <td>NaN</td>\n",
" <td>2</td>\n",
" <td>NaN</td>\n",
" <td>244</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>7</td>\n",
" <td>40</td>\n",
" <td>2</td>\n",
" <td>76</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>35847</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>2016-03-29 23:42:13</td>\n",
" <td>Volkswagen_Golf_1.4</td>\n",
" <td>privat</td>\n",
" <td>Angebot</td>\n",
" <td>NaN</td>\n",
" <td>test</td>\n",
" <td>limousine</td>\n",
" <td>NaN</td>\n",
" <td>manuell</td>\n",
" <td>NaN</td>\n",
" <td>golf</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>benzin</td>\n",
" <td>volkswagen</td>\n",
" <td>nein</td>\n",
" <td>2016-04-03 00:00:00</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2016-04-07 06:17:27</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>3</td>\n",
" <td>75</td>\n",
" <td>44911</td>\n",
" <td>44911</td>\n",
" <td>NaN</td>\n",
" <td>23164</td>\n",
" <td>12046</td>\n",
" <td>NaN</td>\n",
" <td>33306</td>\n",
" <td>NaN</td>\n",
" <td>3622</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>26780</td>\n",
" <td>9632</td>\n",
" <td>33727</td>\n",
" <td>1752</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>5981.676137</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2005.062657</td>\n",
" <td>NaN</td>\n",
" <td>120.647280</td>\n",
" <td>NaN</td>\n",
" <td>125591.391864</td>\n",
" <td>5.906014</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>51264.749371</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>6800.844761</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>89.832750</td>\n",
" <td>NaN</td>\n",
" <td>205.454015</td>\n",
" <td>NaN</td>\n",
" <td>39326.024997</td>\n",
" <td>3.634168</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>25698.915517</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>500.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1000.000000</td>\n",
" <td>NaN</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>5000.000000</td>\n",
" <td>0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>1067.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1500.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2000.000000</td>\n",
" <td>NaN</td>\n",
" <td>75.000000</td>\n",
" <td>NaN</td>\n",
" <td>100000.000000</td>\n",
" <td>3.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>30952.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>3499.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2004.000000</td>\n",
" <td>NaN</td>\n",
" <td>110.000000</td>\n",
" <td>NaN</td>\n",
" <td>150000.000000</td>\n",
" <td>6.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>50189.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>7888.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2008.000000</td>\n",
" <td>NaN</td>\n",
" <td>150.000000</td>\n",
" <td>NaN</td>\n",
" <td>150000.000000</td>\n",
" <td>9.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>72138.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>50000.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>9999.000000</td>\n",
" <td>NaN</td>\n",
" <td>17700.000000</td>\n",
" <td>NaN</td>\n",
" <td>150000.000000</td>\n",
" <td>12.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.0</td>\n",
" <td>99998.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" date_crawled name seller offer_type \\\n",
"count 44911 44911 44911 44911 \n",
"unique 43486 34415 1 1 \n",
"top 2016-03-29 23:42:13 Volkswagen_Golf_1.4 privat Angebot \n",
"freq 3 75 44911 44911 \n",
"mean NaN NaN NaN NaN \n",
"std NaN NaN NaN NaN \n",
"min NaN NaN NaN NaN \n",
"25% NaN NaN NaN NaN \n",
"50% NaN NaN NaN NaN \n",
"75% NaN NaN NaN NaN \n",
"max NaN NaN NaN NaN \n",
"\n",
" price ab_test vehicle_type registration_year gearbox \\\n",
"count 44911.000000 44911 41239 44911.000000 43103 \n",
"unique NaN 2 8 NaN 2 \n",
"top NaN test limousine NaN manuell \n",
"freq NaN 23164 12046 NaN 33306 \n",
"mean 5981.676137 NaN NaN 2005.062657 NaN \n",
"std 6800.844761 NaN NaN 89.832750 NaN \n",
"min 500.000000 NaN NaN 1000.000000 NaN \n",
"25% 1500.000000 NaN NaN 2000.000000 NaN \n",
"50% 3499.000000 NaN NaN 2004.000000 NaN \n",
"75% 7888.000000 NaN NaN 2008.000000 NaN \n",
"max 50000.000000 NaN NaN 9999.000000 NaN \n",
"\n",
" power_ps model odometer_km registration_month fuel_type \\\n",
"count 44911.000000 42861 44911.000000 44911.000000 41721 \n",
"unique NaN 244 NaN NaN 7 \n",
"top NaN golf NaN NaN benzin \n",
"freq NaN 3622 NaN NaN 26780 \n",
"mean 120.647280 NaN 125591.391864 5.906014 NaN \n",
"std 205.454015 NaN 39326.024997 3.634168 NaN \n",
"min 0.000000 NaN 5000.000000 0.000000 NaN \n",
"25% 75.000000 NaN 100000.000000 3.000000 NaN \n",
"50% 110.000000 NaN 150000.000000 6.000000 NaN \n",
"75% 150.000000 NaN 150000.000000 9.000000 NaN \n",
"max 17700.000000 NaN 150000.000000 12.000000 NaN \n",
"\n",
" brand unrepaired_damage ad_created num_pictures \\\n",
"count 44911 37401 44911 44911.0 \n",
"unique 40 2 76 NaN \n",
"top volkswagen nein 2016-04-03 00:00:00 NaN \n",
"freq 9632 33727 1752 NaN \n",
"mean NaN NaN NaN 0.0 \n",
"std NaN NaN NaN 0.0 \n",
"min NaN NaN NaN 0.0 \n",
"25% NaN NaN NaN 0.0 \n",
"50% NaN NaN NaN 0.0 \n",
"75% NaN NaN NaN 0.0 \n",
"max NaN NaN NaN 0.0 \n",
"\n",
" postal_code last_seen \n",
"count 44911.000000 44911 \n",
"unique NaN 35847 \n",
"top NaN 2016-04-07 06:17:27 \n",
"freq NaN 8 \n",
"mean 51264.749371 NaN \n",
"std 25698.915517 NaN \n",
"min 1067.000000 NaN \n",
"25% 30952.000000 NaN \n",
"50% 50189.000000 NaN \n",
"75% 72138.000000 NaN \n",
"max 99998.000000 NaN "
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos_new1.describe(include='all')"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Exploring the Date Columns"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"There are 5 columns that should represent date values. Some of these columns were created by the crawler, some came from the website itself. \n",
"\n",
"- 'date_crawled': added by the crawler\n",
"- 'last_seen': added by the crawler\n",
"- 'ad_created': from the website\n",
"- 'registration_month': from the website\n",
"- 'registration_year': from the website\n",
"\n",
"Right now, date_crawled, last_seen and ad_created are all string values so we will need to convert the data into a numerical representation so we can understand it quantitatively. The other two columns are represented as numeric values. "
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>date_crawled</th>\n",
" <th>ad_created</th>\n",
" <th>last_seen</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2016-03-26 17:47:46</td>\n",
" <td>2016-03-26 00:00:00</td>\n",
" <td>2016-04-06 06:45:54</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2016-04-04 13:38:56</td>\n",
" <td>2016-04-04 00:00:00</td>\n",
" <td>2016-04-06 14:45:08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2016-03-26 18:57:24</td>\n",
" <td>2016-03-26 00:00:00</td>\n",
" <td>2016-04-06 20:15:37</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2016-03-12 16:58:10</td>\n",
" <td>2016-03-12 00:00:00</td>\n",
" <td>2016-03-15 03:16:28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2016-04-01 14:38:50</td>\n",
" <td>2016-04-01 00:00:00</td>\n",
" <td>2016-04-01 14:38:50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" date_crawled ad_created last_seen\n",
"0 2016-03-26 17:47:46 2016-03-26 00:00:00 2016-04-06 06:45:54\n",
"1 2016-04-04 13:38:56 2016-04-04 00:00:00 2016-04-06 14:45:08\n",
"2 2016-03-26 18:57:24 2016-03-26 00:00:00 2016-04-06 20:15:37\n",
"3 2016-03-12 16:58:10 2016-03-12 00:00:00 2016-03-15 03:16:28\n",
"4 2016-04-01 14:38:50 2016-04-01 00:00:00 2016-04-01 14:38:50"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos_new1[['date_crawled', 'ad_created', 'last_seen']][0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"The first ten characters represent the day. We can extract just the date values to generate a distribution and then sort by the index. "
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"0 2016-03-26\n",
"1 2016-04-04\n",
"2 2016-03-26\n",
"3 2016-03-12\n",
"4 2016-04-01\n",
"Name: date_crawled, dtype: object"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos_new1['date_crawled'].str[:10].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"2016-03-05 0.025584\n",
"2016-03-06 0.014161\n",
"2016-03-07 0.036183\n",
"2016-03-08 0.033177\n",
"2016-03-09 0.032932\n",
"2016-03-10 0.032820\n",
"2016-03-11 0.032976\n",
"2016-03-12 0.037385\n",
"2016-03-13 0.015564\n",
"2016-03-14 0.036361\n",
"2016-03-15 0.034023\n",
"2016-03-16 0.029347\n",
"2016-03-17 0.031150\n",
"2016-03-18 0.012870\n",
"2016-03-19 0.034758\n",
"2016-03-20 0.038142\n",
"2016-03-21 0.037697\n",
"2016-03-22 0.032976\n",
"2016-03-23 0.032331\n",
"2016-03-24 0.028991\n",
"2016-03-25 0.031061\n",
"2016-03-26 0.032665\n",
"2016-03-27 0.031106\n",
"2016-03-28 0.034824\n",
"2016-03-29 0.033288\n",
"2016-03-30 0.033377\n",
"2016-03-31 0.031640\n",
"2016-04-01 0.033867\n",
"2016-04-02 0.035715\n",
"2016-04-03 0.038810\n",
"2016-04-04 0.036606\n",
"2016-04-05 0.013070\n",
"2016-04-06 0.003184\n",
"2016-04-07 0.001358\n",
"Name: date_crawled, dtype: float64"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos_new1['date_crawled'].str[:10].value_counts(normalize=True, dropna=False).sort_index()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"#### Observations for date_crawled\n",
"- Insert here"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"0 2016-03-26\n",
"1 2016-04-04\n",
"2 2016-03-26\n",
"3 2016-03-12\n",
"4 2016-04-01\n",
"Name: ad_created, dtype: object"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos_new1['ad_created'].str[:10].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"2015-06-11 0.000022\n",
"2015-08-10 0.000022\n",
"2015-09-09 0.000022\n",
"2015-11-10 0.000022\n",
"2015-12-05 0.000022\n",
"2015-12-30 0.000022\n",
"2016-01-03 0.000022\n",
"2016-01-07 0.000022\n",
"2016-01-10 0.000045\n",
"2016-01-13 0.000022\n",
"2016-01-14 0.000022\n",
"2016-01-16 0.000022\n",
"2016-01-22 0.000022\n",
"2016-01-27 0.000067\n",
"2016-01-29 0.000022\n",
"2016-02-01 0.000022\n",
"2016-02-02 0.000045\n",
"2016-02-05 0.000045\n",
"2016-02-07 0.000022\n",
"2016-02-08 0.000022\n",
"2016-02-09 0.000022\n",
"2016-02-11 0.000022\n",
"2016-02-12 0.000045\n",
"2016-02-14 0.000045\n",
"2016-02-16 0.000022\n",
"2016-02-17 0.000022\n",
"2016-02-18 0.000045\n",
"2016-02-19 0.000067\n",
"2016-02-20 0.000045\n",
"2016-02-21 0.000045\n",
" ... \n",
"2016-03-09 0.033043\n",
"2016-03-10 0.032553\n",
"2016-03-11 0.033288\n",
"2016-03-12 0.037162\n",
"2016-03-13 0.017011\n",
"2016-03-14 0.034913\n",
"2016-03-15 0.033822\n",
"2016-03-16 0.029837\n",
"2016-03-17 0.030794\n",
"2016-03-18 0.013516\n",
"2016-03-19 0.033622\n",
"2016-03-20 0.038276\n",
"2016-03-21 0.037919\n",
"2016-03-22 0.032731\n",
"2016-03-23 0.032152\n",
"2016-03-24 0.028968\n",
"2016-03-25 0.031217\n",
"2016-03-26 0.032642\n",
"2016-03-27 0.031061\n",
"2016-03-28 0.034891\n",
"2016-03-29 0.033244\n",
"2016-03-30 0.033266\n",
"2016-03-31 0.031707\n",
"2016-04-01 0.033800\n",
"2016-04-02 0.035403\n",
"2016-04-03 0.039010\n",
"2016-04-04 0.037007\n",
"2016-04-05 0.011801\n",
"2016-04-06 0.003273\n",
"2016-04-07 0.001202\n",
"Name: ad_created, Length: 76, dtype: float64"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos_new1['ad_created'].str[:10].value_counts(normalize=True, dropna=False).sort_index()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"#### Observations for ad_created\n",
"- Insert here"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"0 2016-04-06\n",
"1 2016-04-06\n",
"2 2016-04-06\n",
"3 2016-03-15\n",
"4 2016-04-01\n",
"Name: last_seen, dtype: object"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos_new1['last_seen'].str[:10].head(5)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"2016-03-05 0.001091\n",
"2016-03-06 0.004186\n",
"2016-03-07 0.005188\n",
"2016-03-08 0.007036\n",
"2016-03-09 0.009463\n",
"2016-03-10 0.010332\n",
"2016-03-11 0.012091\n",
"2016-03-12 0.023936\n",
"2016-03-13 0.008907\n",
"2016-03-14 0.012313\n",
"2016-03-15 0.015698\n",
"2016-03-16 0.016143\n",
"2016-03-17 0.027788\n",
"2016-03-18 0.007392\n",
"2016-03-19 0.015453\n",
"2016-03-20 0.020396\n",
"2016-03-21 0.020730\n",
"2016-03-22 0.021287\n",
"2016-03-23 0.018414\n",
"2016-03-24 0.019528\n",
"2016-03-25 0.018615\n",
"2016-03-26 0.016477\n",
"2016-03-27 0.015453\n",
"2016-03-28 0.020574\n",
"2016-03-29 0.021442\n",
"2016-03-30 0.024181\n",
"2016-03-31 0.023446\n",
"2016-04-01 0.022934\n",
"2016-04-02 0.024938\n",
"2016-04-03 0.024960\n",
"2016-04-04 0.024404\n",
"2016-04-05 0.126183\n",
"2016-04-06 0.225156\n",
"2016-04-07 0.133865\n",
"Name: last_seen, dtype: float64"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos_new1['last_seen'].str[:10].value_counts(normalize=True, dropna=False).sort_index()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Observations for last_seen\n",
"- Insert here"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"count 44911.000000\n",
"mean 5.906014\n",
"std 3.634168\n",
"min 0.000000\n",
"25% 3.000000\n",
"50% 6.000000\n",
"75% 9.000000\n",
"max 12.000000\n",
"Name: registration_month, dtype: float64"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"# understand distribution of registration_month and registration_year\n",
"autos_new1['registration_month'].describe()"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [{
"data": {
"text/plain": [
"count 44911.000000\n",
"mean 2005.062657\n",
"std 89.832750\n",
"min 1000.000000\n",
"25% 2000.000000\n",
"50% 2004.000000\n",
"75% 2008.000000\n",
"max 9999.000000\n",
"Name: registration_year, dtype: float64"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}],
"source": [
"autos_new1['registration_year'].describe()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment