Skip to content

Instantly share code, notes, and snippets.

@taruma
Last active April 5, 2022 22:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save taruma/7bf2e4e1601ab8390d9919043eb87682 to your computer and use it in GitHub Desktop.
Save taruma/7bf2e4e1601ab8390d9919043eb87682 to your computer and use it in GitHub Desktop.
taruma_hk151_uji_outlier.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "taruma_hk151_uji_outlier.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyOr2nRSQ1U/5EbR9wks+V6H",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/taruma/7bf2e4e1601ab8390d9919043eb87682/taruma_hk151_uji_outlier.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"Berdasarkan isu [#151](https://github.com/hidrokit/hidrokit/issues/151): **Uji Outlier**\n",
"\n",
"Referensi Isu:\n",
"- Te Chow, V. (2010). Applied Hydrology. Tata McGraw-Hill Education. https://books.google.co.id/books?id=RRwidSsBJrEC p403-405.\n",
"- Limantara, Lily M. (2018): Rekayasa Hidrologi, Edisi Revisi. Penerbit Andi Offset, Yogyakarta. (hal. 89-93)\n",
"\n",
"Deksripsi Isu:\n",
"- Mencari nilai outlier pada data\n",
"\n",
"Strategi:\n",
"- Buat tabel lampiran yang digunakan untuk mencari nilai $K_n$.\n",
" - Tabel `rel_Kn_n`, hubungan $K_n$ dengan jumlah data $N$.\n",
"- Membuat fungsi membaca data kemudian mengeluarkan nilai batas bawah dan atas\n",
"- Periksa juga apakah data memiliki outlier atau tidak. Jika iya, mungkin dikeluarkan bagian mana saja yang memiliki outlier. \n"
],
"metadata": {
"id": "rHVqziJtCv5C"
}
},
{
"cell_type": "markdown",
"source": [
"# PERSIAPAN DAN DATASET"
],
"metadata": {
"id": "U5KoRm3pEMbj"
}
},
{
"cell_type": "code",
"source": [
"import numpy as np\n",
"import pandas as pd"
],
"metadata": {
"id": "3UcFCSzuESf9"
},
"execution_count": 1,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# contoh diambil dari buku\n",
"# Limantara, Lily M. (2018): Rekayasa Hidrologi, Edisi Revisi. \n",
"# Penerbit Andi Offset, Yogyakarta. (hal. 90-91)\n",
"\n",
"_hujan = np.array([2818, 2542, 1949, 1842, 1748, 1737, 1605, 1558, 1433, 1264])\n",
"_index = np.array([2010, 2013, 2008, 2012, 2011, 2014, 2009, 2007, 2015, 2006])\n",
"data = pd.DataFrame(\n",
" data=np.stack([_index, _hujan], axis=1), \n",
" columns=['Tahun','Hujan']\n",
")\n",
"\n",
"# ubah kolom tahun jadi datetime, dan index\n",
"data.Tahun = pd.to_datetime(data.Tahun, format=\"%Y\")\n",
"data.set_index('Tahun', inplace=True)\n",
"data"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 394
},
"id": "jvXjaqznFSRp",
"outputId": "f1cdd35b-9ca6-4aa9-ba02-8b44d4e388ef"
},
"execution_count": 2,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Hujan\n",
"Tahun \n",
"2010-01-01 2818\n",
"2013-01-01 2542\n",
"2008-01-01 1949\n",
"2012-01-01 1842\n",
"2011-01-01 1748\n",
"2014-01-01 1737\n",
"2009-01-01 1605\n",
"2007-01-01 1558\n",
"2015-01-01 1433\n",
"2006-01-01 1264"
],
"text/html": [
"\n",
" <div id=\"df-928b2017-ecb2-4ecc-8be8-9c39360bbeec\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Hujan</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Tahun</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2010-01-01</th>\n",
" <td>2818</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>2542</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2008-01-01</th>\n",
" <td>1949</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-01-01</th>\n",
" <td>1842</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011-01-01</th>\n",
" <td>1748</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2014-01-01</th>\n",
" <td>1737</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2009-01-01</th>\n",
" <td>1605</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2007-01-01</th>\n",
" <td>1558</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015-01-01</th>\n",
" <td>1433</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2006-01-01</th>\n",
" <td>1264</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-928b2017-ecb2-4ecc-8be8-9c39360bbeec')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-928b2017-ecb2-4ecc-8be8-9c39360bbeec button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-928b2017-ecb2-4ecc-8be8-9c39360bbeec');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 2
}
]
},
{
"cell_type": "markdown",
"source": [
"# TABEL\n",
"\n",
"Nilai tabel mengikuti referensi buku. \n",
"- Tabel `t_rel_Kn_n` diambil pada buku Te Chow, V. (2010). Applied Hydrology. p404. \n",
"\n",
"Tabel yang digunakan dalam perhitungan dibangkitkan dengan kode dibawah ini:"
],
"metadata": {
"id": "6q8OjClsESz9"
}
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "93ULKgYCCeHP",
"outputId": "d99090a3-0b50-405c-cede-2d7f507c923a"
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" N Kn\n",
"0 10.0 2.036\n",
"1 11.0 2.088\n",
"2 12.0 2.134\n",
"3 13.0 2.175\n",
"4 14.0 2.213"
],
"text/html": [
"\n",
" <div id=\"df-e80651ba-f6a4-4f43-aae0-84637a3d2cd0\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>N</th>\n",
" <th>Kn</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>10.0</td>\n",
" <td>2.036</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>11.0</td>\n",
" <td>2.088</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>12.0</td>\n",
" <td>2.134</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>13.0</td>\n",
" <td>2.175</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>14.0</td>\n",
" <td>2.213</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-e80651ba-f6a4-4f43-aae0-84637a3d2cd0')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-e80651ba-f6a4-4f43-aae0-84637a3d2cd0 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-e80651ba-f6a4-4f43-aae0-84637a3d2cd0');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 3
}
],
"source": [
"_N = np.array(\n",
" [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, \n",
" 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, \n",
" 46, 47, 48, 49, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 110, 120, \n",
" 130, 140]\n",
")\n",
"\n",
"_Kn = np.array(\n",
" [2.036, 2.088, 2.134, 2.175, 2.213, 2.247, 2.279, 2.309, 2.335, 2.361, \n",
" 2.385, 2.408, 2.429, 2.448, 2.467, 2.486, 2.502, 2.519, 2.534, 2.549, \n",
" 2.563, 2.577, 2.591, 2.604, 2.616, 2.628, 2.639, 2.65, 2.661, 2.671, \n",
" 2.682, 2.692, 2.7, 2.71, 2.719, 2.727, 2.736, 2.744, 2.753, 2.76, \n",
" 2.768, 2.804, 2.837, 2.866, 2.893, 2.917, 2.94, 2.961, 2.981, 3, \n",
" 3.017, 3.049, 3.078, 3.104, 3.129]\n",
")\n",
"\n",
"t_rel_Kn_n = pd.DataFrame(np.stack((_N, _Kn), axis=1), columns=['N', 'Kn'])\n",
"t_rel_Kn_n.head()"
]
},
{
"cell_type": "markdown",
"source": [
"# KODE"
],
"metadata": {
"id": "7dpZ9qtWEO8d"
}
},
{
"cell_type": "code",
"source": [
"def find_Kn(n, table=t_rel_Kn_n):\n",
" if n < 10 or n > 140:\n",
" raise ValueError('Jumlah data diluar batas bawah (10) / batas atas (140)')\n",
" else:\n",
" N = table['N'].to_numpy()\n",
" Kn = table['Kn'].to_numpy()\n",
" return np.interp(n, N, Kn)\n",
"\n",
"def calc_boundary(df, col=None, result='value', show_stat=False):\n",
" col = df.columns[0] if col is None else col\n",
" \n",
" x = df[col].to_numpy()\n",
" n = x.size\n",
" xlog = np.log10(x)\n",
" xlogmean = xlog.mean()\n",
" xlogstd = xlog.std(ddof=1)\n",
"\n",
" Kn = find_Kn(n)\n",
"\n",
" # higher\n",
" y_h = xlogmean + Kn*xlogstd\n",
" val_h = 10**y_h\n",
"\n",
" # lower\n",
" y_l = xlogmean - Kn*xlogstd\n",
" val_l = 10**y_l\n",
"\n",
" if show_stat:\n",
" print(\n",
" f'Statistik:',\n",
" f'N = {n}',\n",
" f'Mean (log) = {xlogmean:.5f}',\n",
" f'Std (log) = {xlogstd:.5f}',\n",
" f'Lower (val) = {val_l:.5f}',\n",
" f'Higher (val) = {val_h:.5f}',\n",
" sep='\\n', end='\\n\\n'\n",
" )\n",
"\n",
" if result.lower() == 'value':\n",
" return (val_l, val_h)\n",
" elif result.lower() == 'log':\n",
" return (y_l, y_h)\n",
"\n",
"def find_outlier(df, col=None, verbose=False, **kwargs):\n",
" \n",
" low, high = calc_boundary(df, col, **kwargs)\n",
" \n",
" col = df.columns[0] if col is None else col\n",
"\n",
" masklow = df[col] < low\n",
" maskhigh = df[col] > high\n",
" mask = masklow | maskhigh\n",
" \n",
" if verbose and masklow.sum():\n",
" print(f'Ada outlier dibawah batas bawah sebanyak {masklow.sum()}.')\n",
" if verbose and maskhigh.sum():\n",
" print(f'Ada outlier diatas batas atas sebanyak {maskhigh.sum()}.')\n",
"\n",
" def check_outlier(x):\n",
" if x < low:\n",
" return \"lower\"\n",
" elif x > high:\n",
" return \"higher\"\n",
" else:\n",
" return pd.NA\n",
"\n",
" if mask.sum() != 0:\n",
" new_df = df.copy()\n",
" new_df['outlier'] = df[col].apply(check_outlier)\n",
" return new_df[[col, 'outlier']]\n",
" else:\n",
" print(\"Tidak ada Outlier\")\n",
" return None\n"
],
"metadata": {
"id": "qrEraw9aJQY8"
},
"execution_count": 4,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"# FUNGSI"
],
"metadata": {
"id": "bQOEn6MvX_SH"
}
},
{
"cell_type": "markdown",
"source": [
"## Fungsi `find_Kn(n)`\n",
"\n",
"Fungsi ini digunakan untuk mencari nilai $K_n$ untuk perhitungan uji outlier.\n",
"- Argumen fungsi:\n",
" - `n`: jumlah data $\\left(10 \\le \\mathbb{N} \\le 140\\right)$. Diluar batasan tersebut akan menghasilkan peringatan `ValueError`. "
],
"metadata": {
"id": "J3Qx2I7wYKC9"
}
},
{
"cell_type": "code",
"source": [
"find_Kn(n=10)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Ckk-C2erIM4N",
"outputId": "910b45f5-92bf-49f0-8ff8-04f3461a28aa"
},
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"2.036"
]
},
"metadata": {},
"execution_count": 5
}
]
},
{
"cell_type": "code",
"source": [
"try:\n",
" find_Kn(141)\n",
"except ValueError:\n",
" print(\"Hasil akan error\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ZoeBNqzKIkUT",
"outputId": "b1878603-6e8e-44c0-e847-a26d6ae98cfe"
},
"execution_count": 6,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Hasil akan error\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## Fungsi `calc_boundary(df, col=None, result='value', show_stat=False)`\n",
"\n",
"Fungsi `calc_boundary(...)` digunakan untuk mencari nilai batas bawah dan batas atas outlier dari data. Keluaran fungsi ini berupa _tuple_ dengan bentuk `(low_boundary, high_boundary)`.\n",
"\n",
"- Argumen Posisi:\n",
" - `df`: dataset dalam objek `pandas.DataFrame`.\n",
"- Argumen Opsional:\n",
" - `col=None`: nama kolom data yang akan dicek outlier. Jika `None` maka dipilih kolom pertama dari dataframe.\n",
" - `result=\"value\"`: keluaran berupa nilai batasan aktual dengan skala original. Jika menggunakan `log`, keluaran berupa nilai batasan dalam skala logaritmik.\n",
" - `show_stat=False`: jika `True` akan menampilkan nilai statistik berupa jumlah data, rata-rata, standar deviasi, batasan bawah dan atas dalam skala original. "
],
"metadata": {
"id": "Dqrzk04-IyWG"
}
},
{
"cell_type": "code",
"source": [
"calc_boundary(data)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "YsvTaGM1KMKk",
"outputId": "22337f63-2769-475d-f60b-5012c025b900"
},
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(1092.1254115165966, 2961.0550928195808)"
]
},
"metadata": {},
"execution_count": 7
}
]
},
{
"cell_type": "code",
"source": [
"calc_boundary(data, col='Hujan', result='log', show_stat=True)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "SOfoDoqOKPJA",
"outputId": "2177e778-356a-40c1-87ba-b45a1925971c"
},
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Statistik:\n",
"N = 10\n",
"Mean (log) = 3.25486\n",
"Std (log) = 0.10638\n",
"Lower (val) = 1092.12541\n",
"Higher (val) = 2961.05509\n",
"\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(3.038272512363497, 3.471446487863784)"
]
},
"metadata": {},
"execution_count": 8
}
]
},
{
"cell_type": "markdown",
"source": [
"## Fungsi `find_outlier(df, col=None, verbose=False, **kwargs)`\n",
"\n",
"Fungsi `find_outlier(...)` digunakan untuk memeriksa apakah data memiliki outlier atau tidak dan memberi keluaran berupa dataframe yang telah ditandai data mana saja yang dikategorikan outlier.\n",
"\n",
"- Argumen Posisi:\n",
" - `df`: dataset dalam objek `pandas.DataFrame`.\n",
"- Argumen Opsional:\n",
" - `col=None`: nama kolom data yang akan dicek outlier. Jika `None` maka dipilih kolom pertama dari dataframe.\n",
" - `verbose=False`: memberi informasi tambahan jika memiliki outlier dan seberapa banyak. \n",
" - `**kwargs`: _keyword arguments_ dari fungsi `.calc_boundary()`. "
],
"metadata": {
"id": "wo96vsXZKVZz"
}
},
{
"cell_type": "code",
"source": [
"find_outlier(data)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5B9VuHVNLi07",
"outputId": "840e2964-a1de-47f9-c54b-4e55e7fac2d6"
},
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Tidak ada Outlier\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"find_outlier(data, show_stat=True)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "cqb_BDamLl69",
"outputId": "c95699a6-dccc-4733-d516-848b04864569"
},
"execution_count": 10,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Statistik:\n",
"N = 10\n",
"Mean (log) = 3.25486\n",
"Std (log) = 0.10638\n",
"Lower (val) = 1092.12541\n",
"Higher (val) = 2961.05509\n",
"\n",
"Tidak ada Outlier\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# contoh data dengan outlier\n",
"\n",
"data2 = data.copy()\n",
"data2.loc['2012'] = 4000"
],
"metadata": {
"id": "mddyShZNTaN6"
},
"execution_count": 11,
"outputs": []
},
{
"cell_type": "code",
"source": [
"find_outlier(data2, 'Hujan', verbose=True)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 412
},
"id": "sj0gMIhlNiX-",
"outputId": "bb4a97c7-70e4-4274-f387-e386ce5834a0"
},
"execution_count": 12,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Ada outlier diatas batas atas sebanyak 1.\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Hujan outlier\n",
"Tahun \n",
"2010-01-01 2818 <NA>\n",
"2013-01-01 2542 <NA>\n",
"2008-01-01 1949 <NA>\n",
"2012-01-01 4000 higher\n",
"2011-01-01 1748 <NA>\n",
"2014-01-01 1737 <NA>\n",
"2009-01-01 1605 <NA>\n",
"2007-01-01 1558 <NA>\n",
"2015-01-01 1433 <NA>\n",
"2006-01-01 1264 <NA>"
],
"text/html": [
"\n",
" <div id=\"df-aedd32c5-9ab6-4664-9623-3fce49c2ee07\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Hujan</th>\n",
" <th>outlier</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Tahun</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2010-01-01</th>\n",
" <td>2818</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>2542</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2008-01-01</th>\n",
" <td>1949</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-01-01</th>\n",
" <td>4000</td>\n",
" <td>higher</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011-01-01</th>\n",
" <td>1748</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2014-01-01</th>\n",
" <td>1737</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2009-01-01</th>\n",
" <td>1605</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2007-01-01</th>\n",
" <td>1558</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015-01-01</th>\n",
" <td>1433</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2006-01-01</th>\n",
" <td>1264</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-aedd32c5-9ab6-4664-9623-3fce49c2ee07')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-aedd32c5-9ab6-4664-9623-3fce49c2ee07 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-aedd32c5-9ab6-4664-9623-3fce49c2ee07');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 12
}
]
},
{
"cell_type": "code",
"source": [
"data2.loc['2012'] = 40\n",
"\n",
"find_outlier(data2, show_stat=True)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 516
},
"id": "TiWLlhW3Wm9I",
"outputId": "bd5369c2-8102-4cef-ed95-df4ef7d34cca"
},
"execution_count": 13,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Statistik:\n",
"N = 10\n",
"Mean (log) = 3.08854\n",
"Std (log) = 0.53301\n",
"Lower (val) = 100.77150\n",
"Higher (val) = 14918.85009\n",
"\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Hujan outlier\n",
"Tahun \n",
"2010-01-01 2818 <NA>\n",
"2013-01-01 2542 <NA>\n",
"2008-01-01 1949 <NA>\n",
"2012-01-01 40 lower\n",
"2011-01-01 1748 <NA>\n",
"2014-01-01 1737 <NA>\n",
"2009-01-01 1605 <NA>\n",
"2007-01-01 1558 <NA>\n",
"2015-01-01 1433 <NA>\n",
"2006-01-01 1264 <NA>"
],
"text/html": [
"\n",
" <div id=\"df-8e90e3bd-4fcf-4477-894d-b1de2057fb99\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Hujan</th>\n",
" <th>outlier</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Tahun</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2010-01-01</th>\n",
" <td>2818</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-01-01</th>\n",
" <td>2542</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2008-01-01</th>\n",
" <td>1949</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-01-01</th>\n",
" <td>40</td>\n",
" <td>lower</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011-01-01</th>\n",
" <td>1748</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2014-01-01</th>\n",
" <td>1737</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2009-01-01</th>\n",
" <td>1605</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2007-01-01</th>\n",
" <td>1558</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015-01-01</th>\n",
" <td>1433</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2006-01-01</th>\n",
" <td>1264</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-8e90e3bd-4fcf-4477-894d-b1de2057fb99')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
" \n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
" \n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-8e90e3bd-4fcf-4477-894d-b1de2057fb99 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-8e90e3bd-4fcf-4477-894d-b1de2057fb99');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n",
" "
]
},
"metadata": {},
"execution_count": 13
}
]
},
{
"cell_type": "markdown",
"source": [
"# Changelog\n",
"\n",
"```\n",
"- 20220303 - 1.0.0 - Initial\n",
"```\n",
"\n",
"#### Copyright &copy; 2022 [Taruma Sakti Megariansyah](https://taruma.github.io)\n",
"\n",
"Source code in this notebook is licensed under a [MIT License](https://choosealicense.com/licenses/mit/). Data in this notebook is licensed under a [Creative Common Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/). \n"
],
"metadata": {
"id": "h2ycp8K8MWBn"
}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment