Kanav-Arora/Pandas.ipynb

## Pandas.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "collapsed_sections": [
        "HjAB6Ln9awDh",
        "wfalGQN1cEN_",
        "W9cgJVDRpWUF"
      ]
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FE2X9KayZtEK"
      },
      "source": [
        "# Pandas\n",
        "Pandas stands for panel data and is the core library for data manipulation and data analysis.\n",
        "It consists of single and multi-dimensional data structures for data manipulation."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "QFZJ9hE1Zrpm"
      },
      "source": [
        "import pandas as pd\n",
        "import numpy as np"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CmaL263Iab-c"
      },
      "source": [
        "There are two core objects in pandas: the DataFrame and the Series"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HjAB6Ln9awDh"
      },
      "source": [
        "## Dataframe\n",
        "A dataframe is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "_llE0vv0DDQE"
      },
      "source": [
        "pd.DataFrame({'Supervised':['Classification','Regression'],'Unsupervised':['Clusturing','Dimensionality Reduction']})"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "jJGGSiZ1bsrj"
      },
      "source": [
        "pd.DataFrame({'Supervised':['Classification','Regression'],'Unsupervised':['Clusturing','Dimensionality Reduction']}, index = ['Discrete','Continuous'])"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wfalGQN1cEN_"
      },
      "source": [
        "## Series\n",
        "A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "cG-TSSe8b-S6"
      },
      "source": [
        "pd.Series([1,2,3,4,5])"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xWQGKWpUcp3P"
      },
      "source": [
        "A Series is, in essence, a single column of a DataFrame. So you can assign column values to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "humHISylcqDz"
      },
      "source": [
        "pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Eg9S3RTwdlvi"
      },
      "source": [
        "## Reading from a file"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "PYsz7pUKdrw7",
        "outputId": "fd42833c-91c8-4371-f6a0-50eb411db2d5",
        "colab": {
          "base_uri": "https://localhost:8080/"
        }
      },
      "source": [
        "from google.colab import drive\n",
        "drive.mount('/content/drive')"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "jTt4iZrclOsw"
      },
      "source": [
        "data = pd.read_csv('/content/drive/MyDrive/AI-ML/Assignments/Assignment-1/iris.csv')\n",
        "\n",
        "# data = pd.read_csv('/content/drive/MyDrive/AI-ML/Assignments/Assignment-1/iris.csv', index_col = <column index>)\n",
        "\n",
        "# this is done if S.NO column is already there in the file"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "9VIZ3MH3lePu"
      },
      "source": [
        "data.head()                       # displays first 5 rows of the table"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "DrVxOPCWlqOL"
      },
      "source": [
        "data"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-YpPcvaVmft4"
      },
      "source": [
        "data.shape                    # prints the structure of data Rows = 150, Column=5"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "9FkTgIOzmqHE"
      },
      "source": [
        "data[\"sepal.length\"]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "YR0BBPMZnRPc"
      },
      "source": [
        "data.variety"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "viMip5QInTRm"
      },
      "source": [
        "data[\"sepal.length\"][0]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gXJSLf0FpAKG"
      },
      "source": [
        "## Indexing in Pandas\n",
        "The indexing operator and attribute selection are nice because they work just like they do in the rest of the Python ecosystem. As a novice, this makes them easy to pick up and use. However, pandas has its own accessor operators, loc and iloc. For more advanced operations, these are the ones you're supposed to be using."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "W9cgJVDRpWUF"
      },
      "source": [
        "### Index based selection\n",
        "selecting data based on its numerical position in the data. iloc follows this paradigm."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "UK8Zjv25pF-J"
      },
      "source": [
        "data.iloc[0]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "CYMLTAodqOGD"
      },
      "source": [
        "data.iloc[:,0]              # prints all rows with column at 0 index"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "wNcox6ZLqUdh"
      },
      "source": [
        "data.iloc[::2,0]            # skipping a row"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "gj8JPCL0qqLG"
      },
      "source": [
        "data.iloc[:3,0]             # printing 3 rows"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "gB6NPqVYq60c"
      },
      "source": [
        "data.iloc[[0, 1, 2], 0]     # we can also past list for the parameters"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "n9dMWWrU5gpq"
      },
      "source": [
        "data.iloc[-5:]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b1aFNDLX5myR"
      },
      "source": [
        "### Label Based Selection\n",
        "It's the data index value, not its position, which matters"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "4dSnW-vP5rTC"
      },
      "source": [
        "data.loc[0,'sepal.length']"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MfcilVTM9vyA"
      },
      "source": [
        "data.loc[:,['sepal.length','sepal.width']]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RIrZ90Vg-THg"
      },
      "source": [
        "### Manipulating the Index"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "DkKDgVNP-YMC"
      },
      "source": [
        "# data.set_index(<fieldname>)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cKvBWj2b-wx1"
      },
      "source": [
        "### Conditional Selection"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "qNfOT5sD-sdC"
      },
      "source": [
        "data.variety==\"Setosa\""
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RzU0WCWm_CHZ"
      },
      "source": [
        "data.loc[data.variety==\"Setosa\"]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Y1cuPrL0_anW"
      },
      "source": [
        "data.loc[(data.variety==\"Setosa\") & (data['petal.width']==0.2)]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3TwHyMGyOtai"
      },
      "source": [
        "Pandas comes with a few built-in conditional selectors, two of which we will highlight here.\n",
        "\n",
        "The first is isin. isin is lets you select data whose value \"is in\" a list of values."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "mkFgl9_XOyof"
      },
      "source": [
        "data.loc[data.variety.isin(['Sentosa','Virginica'])]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "oXGSmfH6OvGk"
      },
      "source": [
        "data.loc[data['sepal.width'].notnull()]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "49ezvcwOPwGL"
      },
      "source": [
        "data.loc[data['sepal.width'].isnull()]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2HqtC65lQFUD"
      },
      "source": [
        "### Assigning Values"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-ZmPU1QQQIqS"
      },
      "source": [
        "data['critic'] = \"everyone\"                         # there's no column critic so it will create one\n",
        "data"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "9wXVvt_dQbuA"
      },
      "source": [
        "data['index_backward'] = range(len(data)-1,-1,-1)\n",
        "data"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dszIluMqSP3g"
      },
      "source": [
        "## Summary Functions and Mapping"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b7WtbK6NTqll"
      },
      "source": [
        "### Summary Functions"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QgBpOQJlSkMT"
      },
      "source": [
        "This method generates a high-level summary of the attributes of the given column. It is type-aware, meaning that its output changes based on the data type of the input."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "P-Wn0_7ASYQK"
      },
      "source": [
        "data['sepal.width'].describe()                    # datatype: float"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "tXFBRN_sSsYK"
      },
      "source": [
        "data.variety.describe()                           # datatype: String"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "3PFBgqQFS6Qv"
      },
      "source": [
        "data['sepal.width'].mean()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "zMXHVQVmTWyL"
      },
      "source": [
        "data.variety.unique()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "opnmQlZLTex_"
      },
      "source": [
        "data.variety.value_counts()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-zk59J3vT0MH"
      },
      "source": [
        "### Maps"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "bxfMmzGUT3Ry"
      },
      "source": [
        "sepal_width_mean = data['sepal.width'].mean()\n",
        "data['sepal.width'].map(lambda p : p - sepal_width_mean)              # this will return a new series"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LRYj8fX_0wZ_"
      },
      "source": [
        "apply() is the equivalent method if we want to transform a whole DataFrame by calling a custom method on each row."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lqdA8wnK0o1M"
      },
      "source": [
        "data.variety + \" - \"  + data.critic"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "q703SrTJOK6S"
      },
      "source": [
        "## Grouping and Sorting"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "S2DVA3k-OQ2w"
      },
      "source": [
        "### Grouping"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "PnGFjKxBOQW7"
      },
      "source": [
        "data"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fPmvyLRHPxFI"
      },
      "source": [
        "groupby() created a group of sepal length which allotted the same length values to the given wines. Then, for each of these groups, we grabbed the sepal.length column and counted how many times it appeared. value_counts() is just a shortcut to this groupby() operation."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "7rw-GZz7Onec"
      },
      "source": [
        "data.groupby('sepal.length')['sepal.length'].count()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "655zhpYWRNS3"
      },
      "source": [
        "Another groupby() method worth mentioning is agg(), which lets you run a bunch of different functions on your DataFrame simultaneously. For example, we can generate a simple statistical summary of the dataset as follows:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "WQhPU9GIRN62"
      },
      "source": [
        "data.groupby('variety')['sepal.length'].agg([len, max, min])"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "y_O4sLMEELmE"
      },
      "source": [
        "### Sorting"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Q3OK2D-ZEQqz"
      },
      "source": [
        "data"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "2UWkFkayFEvm"
      },
      "source": [
        "data = data.sort_values(by = 'sepal.length', ascending=False)          #by default ascending is true"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "DlZMn7o2FtMU"
      },
      "source": [
        "data = data.sort_index()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "28ucNZ_oGJVp"
      },
      "source": [
        "## Data Types and Missing Values"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "x4jZ4GW8GNxi"
      },
      "source": [
        "### DType"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "2WCOCoZYGQ1b"
      },
      "source": [
        "data.dtypes"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-Cox6-vPGZxe"
      },
      "source": [
        "data['sepal.length'].dtype"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "yhqxa2iVGqSx"
      },
      "source": [
        "data.index_backward.astype('float64')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "btNZygOgLsmb"
      },
      "source": [
        "### Missing Data\n",
        "Entries missing values are given the value NaN, short for \"Not a Number\". For technical reasons these NaN values are always of the float64 dtype.\n",
        "\n",
        "Pandas provides some methods specific to missing data. To select NaN entries you can use pd.isnull() (or its companion pd.notnull())."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "wzKt5wvgLzOG"
      },
      "source": [
        "data[pd.isnull(data.variety)]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "NtWC3Bm_MFOA"
      },
      "source": [
        "data[pd.notnull(data.variety)]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xgYWaRGOMSWS"
      },
      "source": [
        "Replacing missing values is a common operation. Pandas provides a really handy method for this problem: fillna(). fillna() provides a few different strategies for mitigating such data. For example, we can simply replace each NaN with an \"Unknown\""
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "TOXXww-xMUAj"
      },
      "source": [
        "data.at[0,'sepal.length'] = np.NaN"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "AD4CFs7OMmrY"
      },
      "source": [
        "data.drop((0,'sepal.length'), axis=1)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "-eK_Ns2KNVEX",
        "outputId": "4808c50f-7c78-44d5-88f9-6060df0cfe28"
      },
      "source": [
        "data.iloc[0,:]"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "sepal.length              NaN\n",
              "sepal.width               3.5\n",
              "petal.length              1.4\n",
              "petal.width               0.2\n",
              "variety                Setosa\n",
              "critic               everyone\n",
              "index_backward            149\n",
              "(0, sepal.length)         NaN\n",
              "Name: 0, dtype: object"
            ]
          },
          "metadata": {},
          "execution_count": 82
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "vbf0sNG0PQ8y"
      },
      "source": [
        "data['sepal.length'] = data['sepal.length'].fillna(\"Unknown\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 419
        },
        "id": "MLF_Jfj_PZIR",
        "outputId": "ff2c5ab8-74b4-4bcc-ff33-32fd809c98ab"
      },
      "source": [
        "data"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>sepal.length</th>\n",
              "      <th>sepal.width</th>\n",
              "      <th>petal.length</th>\n",
              "      <th>petal.width</th>\n",
              "      <th>variety</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Unknown</td>\n",
              "      <td>3.5</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>4.9</td>\n",
              "      <td>3.0</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>4.7</td>\n",
              "      <td>3.2</td>\n",
              "      <td>1.3</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>4.6</td>\n",
              "      <td>3.1</td>\n",
              "      <td>1.5</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>5</td>\n",
              "      <td>3.6</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>145</th>\n",
              "      <td>6.7</td>\n",
              "      <td>3.0</td>\n",
              "      <td>5.2</td>\n",
              "      <td>2.3</td>\n",
              "      <td>Virginica</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>146</th>\n",
              "      <td>6.3</td>\n",
              "      <td>2.5</td>\n",
              "      <td>5.0</td>\n",
              "      <td>1.9</td>\n",
              "      <td>Virginica</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>147</th>\n",
              "      <td>6.5</td>\n",
              "      <td>3.0</td>\n",
              "      <td>5.2</td>\n",
              "      <td>2.0</td>\n",
              "      <td>Virginica</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>148</th>\n",
              "      <td>6.2</td>\n",
              "      <td>3.4</td>\n",
              "      <td>5.4</td>\n",
              "      <td>2.3</td>\n",
              "      <td>Virginica</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>149</th>\n",
              "      <td>5.9</td>\n",
              "      <td>3.0</td>\n",
              "      <td>5.1</td>\n",
              "      <td>1.8</td>\n",
              "      <td>Virginica</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>150 rows × 5 columns</p>\n",
              "</div>"
            ],
            "text/plain": [
              "    sepal.length  sepal.width  petal.length  petal.width    variety\n",
              "0        Unknown          3.5           1.4          0.2     Setosa\n",
              "1            4.9          3.0           1.4          0.2     Setosa\n",
              "2            4.7          3.2           1.3          0.2     Setosa\n",
              "3            4.6          3.1           1.5          0.2     Setosa\n",
              "4              5          3.6           1.4          0.2     Setosa\n",
              "..           ...          ...           ...          ...        ...\n",
              "145          6.7          3.0           5.2          2.3  Virginica\n",
              "146          6.3          2.5           5.0          1.9  Virginica\n",
              "147          6.5          3.0           5.2          2.0  Virginica\n",
              "148          6.2          3.4           5.4          2.3  Virginica\n",
              "149          5.9          3.0           5.1          1.8  Virginica\n",
              "\n",
              "[150 rows x 5 columns]"
            ]
          },
          "metadata": {},
          "execution_count": 93
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "cPX1sRuhP9pU"
      },
      "source": [
        "data['sepal.length'] = data['sepal.length'].replace('Unknown', np.NaN)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 419
        },
        "id": "fxa8oRw7QEos",
        "outputId": "9a7f81c1-0561-444e-c847-751555b60aa1"
      },
      "source": [
        "data"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>sepal.length</th>\n",
              "      <th>sepal.width</th>\n",
              "      <th>petal.length</th>\n",
              "      <th>petal.width</th>\n",
              "      <th>variety</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>NaN</td>\n",
              "      <td>3.5</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>4.9</td>\n",
              "      <td>3.0</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>4.7</td>\n",
              "      <td>3.2</td>\n",
              "      <td>1.3</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>4.6</td>\n",
              "      <td>3.1</td>\n",
              "      <td>1.5</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>5.0</td>\n",
              "      <td>3.6</td>\n",
              "      <td>1.4</td>\n",
              "      <td>0.2</td>\n",
              "      <td>Setosa</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>145</th>\n",
              "      <td>6.7</td>\n",
              "      <td>3.0</td>\n",
              "      <td>5.2</td>\n",
              "      <td>2.3</td>\n",
              "      <td>Virginica</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>146</th>\n",
              "      <td>6.3</td>\n",
              "      <td>2.5</td>\n",
              "      <td>5.0</td>\n",
              "      <td>1.9</td>\n",
              "      <td>Virginica</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>147</th>\n",
              "      <td>6.5</td>\n",
              "      <td>3.0</td>\n",
              "      <td>5.2</td>\n",
              "      <td>2.0</td>\n",
              "      <td>Virginica</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>148</th>\n",
              "      <td>6.2</td>\n",
              "      <td>3.4</td>\n",
              "      <td>5.4</td>\n",
              "      <td>2.3</td>\n",
              "      <td>Virginica</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>149</th>\n",
              "      <td>5.9</td>\n",
              "      <td>3.0</td>\n",
              "      <td>5.1</td>\n",
              "      <td>1.8</td>\n",
              "      <td>Virginica</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>150 rows × 5 columns</p>\n",
              "</div>"
            ],
            "text/plain": [
              "     sepal.length  sepal.width  petal.length  petal.width    variety\n",
              "0             NaN          3.5           1.4          0.2     Setosa\n",
              "1             4.9          3.0           1.4          0.2     Setosa\n",
              "2             4.7          3.2           1.3          0.2     Setosa\n",
              "3             4.6          3.1           1.5          0.2     Setosa\n",
              "4             5.0          3.6           1.4          0.2     Setosa\n",
              "..            ...          ...           ...          ...        ...\n",
              "145           6.7          3.0           5.2          2.3  Virginica\n",
              "146           6.3          2.5           5.0          1.9  Virginica\n",
              "147           6.5          3.0           5.2          2.0  Virginica\n",
              "148           6.2          3.4           5.4          2.3  Virginica\n",
              "149           5.9          3.0           5.1          1.8  Virginica\n",
              "\n",
              "[150 rows x 5 columns]"
            ]
          },
          "metadata": {},
          "execution_count": 97
        }
      ]
    }
  ]
}