Skip to content

Instantly share code, notes, and snippets.

@Nithanaroy
Last active December 27, 2019 12:36
Show Gist options
  • Save Nithanaroy/fc0c34dacae7bf3fb46b2e6b6595681b to your computer and use it in GitHub Desktop.
Save Nithanaroy/fc0c34dacae7bf3fb46b2e6b6595681b to your computer and use it in GitHub Desktop.
blog_percentile-buckets_long-tail-distribution.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "## Code for Percentile Buckets"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import seaborn as sns\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport numpy as np",
"execution_count": 1,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Load the prices"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df = pd.read_csv(\"item_prices.csv\")\ndf.head()",
"execution_count": 2,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 2,
"data": {
"text/plain": " price\n0 1833\n1 296\n2 199\n3 4936\n4 1595",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>price</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1833</td>\n </tr>\n <tr>\n <th>1</th>\n <td>296</td>\n </tr>\n <tr>\n <th>2</th>\n <td>199</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4936</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1595</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.describe()",
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 3,
"data": {
"text/plain": " price\ncount 989.000000\nmean 961.792720\nstd 1338.413805\nmin 5.000000\n25% 188.000000\n50% 483.000000\n75% 1140.000000\nmax 8708.000000",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>price</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>count</th>\n <td>989.000000</td>\n </tr>\n <tr>\n <th>mean</th>\n <td>961.792720</td>\n </tr>\n <tr>\n <th>std</th>\n <td>1338.413805</td>\n </tr>\n <tr>\n <th>min</th>\n <td>5.000000</td>\n </tr>\n <tr>\n <th>25%</th>\n <td>188.000000</td>\n </tr>\n <tr>\n <th>50%</th>\n <td>483.000000</td>\n </tr>\n <tr>\n <th>75%</th>\n <td>1140.000000</td>\n </tr>\n <tr>\n <th>max</th>\n <td>8708.000000</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Distribution of prices"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "plt.figure(figsize=(15, 4))\nax = sns.distplot( df.price, kde=True )\n# plt.ylabel(\"fraction of samples\")\nax.grid()",
"execution_count": 4,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 1080x288 with 1 Axes>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA30AAAEICAYAAADr+p3iAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3deZxddX34/9d71mSy74QsZIUYRAVCgrhFEQnKz6iFGrCKLRarWG1tq9B+v7SlpZWvbekiYKmiqGigKDVtEURxQFnCjpKQQBbIRiAbgZlk9s/vj3tmcjPMZO5MkrmZm9fz8Qj3nM/5nM95nzufe7jve875nEgpIUmSJEkqTWXFDkCSJEmSdPiY9EmSJElSCTPpkyRJkqQSZtInSZIkSSXMpE+SJEmSSphJnyRJkiSVsIKSvohYFBGrI2JNRFzWxfLqiLglW748IqblLbs8K18dEWfnld8YES9HxNPdbPNPIiJFxNje75YkSZIkCaCipwoRUQ5cC5wFbAIeiYhlKaWVedUuBnallGZFxBLgauCjETEXWAKcCBwL/Cwijk8ptQLfBr4GfKeLbU4B3gdsKGQnxo4dm6ZNm1ZIVR0h6uvrGTJkSLHD0ABjv1Ff2G/UF/Yb9YX9Rn1xqPrNY489tj2lNK6rZT0mfcB8YE1KaR1ARCwFFgP5Sd9i4K+y6duAr0VEZOVLU0qNwPqIWJO192BK6b78M4KdXAN8CfhxAfExbdo0Hn300UKq6ghRW1vLwoULix2GBhj7jfrCfqO+sN+oL+w36otD1W8i4oXulhVyeeckYGPe/KasrMs6KaUWYDcwpsB1Owe7GNicUnqqgNgkSZIkSQdQyJm+fhMRNcCfk7u0s6e6lwCXAEyYMIHa2trDG5wOqbq6Ov9m6jX7jfrCfqO+sN+oL+w36ov+6DeFJH2bgSl585Ozsq7qbIqICmAEsKPAdfPNBKYDT+WuDmUy8HhEzE8pbc2vmFK6AbgBYN68eclT6QOLlz+oL+w36gv7jfrCfqO+sN+oL/qj3xRyeecjwOyImB4RVeQGZlnWqc4y4KJs+jzgnpRSysqXZKN7TgdmAw93t6GU0m9SSuNTStNSStPIXQ56SueET5IkSZJUmB6Tvuwevc8BdwHPALemlFZExJUR8cGs2jeBMdlALV8ELsvWXQHcSm7QlzuBS7ORO4mIHwAPAidExKaIuPjQ7pokSZIkqaB7+lJKdwB3dCq7Im+6ATi/m3WvAq7qovyCArY7rZD4JEmSJEldK+jh7JIkSZKkgcmkT5IkSZJKmEmfJEmSJJWwI+o5fTq6fH/5hl6vc+GCqYchEkmSJKl0eaZPkiRJkkqYSZ8kSZIklTCTPkmSJEkqYSZ9kiRJklTCTPokSZIkqYSZ9EmSJElSCTPpkyRJkqQSZtInSZIkSSXMpE+SJEmSSphJnyRJkiSVMJM+SZIkSSphJn2SJEmSVMJM+iRJkiSphJn0SZIkSVIJM+mTJEmSpBJm0idJkiRJJcykT5IkSZJKmEmfJEmSJJWwgpK+iFgUEasjYk1EXNbF8uqIuCVbvjwipuUtuzwrXx0RZ+eV3xgRL0fE053a+mpErIqIX0fE7RExsu+7J0mSJElHtx6TvogoB64FzgHmAhdExNxO1S4GdqWUZgHXAFdn684FlgAnAouA67L2AL6dlXV2N/DGlNKbgGeBy3u5T5IkSZKkTCFn+uYDa1JK61JKTcBSYHGnOouBm7Lp24AzIyKy8qUppcaU0npgTdYeKaX7gJ2dN5ZS+mlKqSWbfQiY3Mt9kiRJkiRlKgqoMwnYmDe/CVjQXZ2UUktE7AbGZOUPdVp3Ui/i+z3glq4WRMQlwCUAEyZMoLa2thfNqtjq6uoYVL+q1+vV1q47DNFooKirq/Ozrl6z36gv7DfqC/uN+qI/+k0hSV9RRMRfAC3AzV0tTyndANwAMG/evLRw4cL+C04Hrba2li2DZ/R6vYULph6GaDRQ1NbW4mddvWW/UV/Yb9QX9hv1RX/0m0Iu79wMTMmbn5yVdVknIiqAEcCOAtd9nYj4JHAu8LGUUiogRkmSJElSFwpJ+h4BZkfE9IioIjcwy7JOdZYBF2XT5wH3ZMnaMmBJNrrndGA28PCBNhYRi4AvAR9MKe0pfFckSZIkSZ31mPRlg6p8DrgLeAa4NaW0IiKujIgPZtW+CYyJiDXAF4HLsnVXALcCK4E7gUtTSq0AEfED4EHghIjYFBEXZ219DRgG3B0RT0bE1w/RvkqSJEnSUaege/pSSncAd3QquyJvugE4v5t1rwKu6qL8gm7qzyokJkmSJElSzwp6OLskSZIkaWAy6ZMkSZKkEmbSJ0mSJEklzKRPkiRJkkqYSZ8kSZIklTCTPkmSJEkqYSZ9kiRJklTCTPokSZIkqYSZ9EmSJElSCTPpkyRJkqQSZtInSZIkSSXMpE+SJEmSSphJnyRJkiSVMJM+SZIkSSphJn2SJEmSVMJM+iRJkiSphJn0SZIkSVIJM+mTJEmSpBJm0idJkiRJJcykT5IkSZJKmEmfJEmSJJUwkz5JkiRJKmEFJX0RsSgiVkfEmoi4rIvl1RFxS7Z8eURMy1t2eVa+OiLOziu/MSJejoinO7U1OiLujojnstdRfd89SZIkSTq69Zj0RUQ5cC1wDjAXuCAi5naqdjGwK6U0C7gGuDpbdy6wBDgRWARcl7UH8O2srLPLgJ+nlGYDP8/mJUmSJEl9UMiZvvnAmpTSupRSE7AUWNypzmLgpmz6NuDMiIisfGlKqTGltB5Yk7VHSuk+YGcX28tv6ybgQ73YH0mSJElSnooC6kwCNubNbwIWdFcnpdQSEbuBMVn5Q53WndTD9iaklF7MprcCE7qqFBGXAJcATJgwgdra2h53REeOuro6BtWv6vV6tbXrDkM0Gijq6ur8rKvX7DfqC/uN+sJ+o77oj35TSNJXNCmlFBGpm2U3ADcAzJs3Ly1cuLA/Q9NBqq2tZcvgGb1eb+GCqYchGg0UtbW1+FlXb9lv1Bf2G/WF/UZ90R/9ppDLOzcDU/LmJ2dlXdaJiApgBLCjwHU7eykiJmZtTQReLiBGSZIkSVIXCkn6HgFmR8T0iKgiNzDLsk51lgEXZdPnAfeklFJWviQb3XM6MBt4uIft5bd1EfDjAmKUJEmSJHWhx6QvpdQCfA64C3gGuDWltCIiroyID2bVvgmMiYg1wBfJRtxMKa0AbgVWAncCl6aUWgEi4gfAg8AJEbEpIi7O2voKcFZEPAe8N5uXJEmSJPVBQff0pZTuAO7oVHZF3nQDcH43614FXNVF+QXd1N8BnFlIXJIkSZKkAyvo4eySJEmSpIHJpE+SJEmSSphJnyRJkiSVsCP6OX1SZ99fvqFP613o8/0kSZJ0lPJMnyRJkiSVMJM+SZIkSSphJn2SJEmSVMJM+iRJkiSphJn0SZIkSVIJM+mTJEmSpBJm0idJkiRJJcykT5IkSZJKmEmfJEmSJJUwkz5JkiRJKmEmfZIkSZJUwkz6JEmSJKmEmfRJkiRJUgkz6ZMkSZKkEmbSJ0mSJEklzKRPkiRJkkqYSZ8kSZIklbCCkr6IWBQRqyNiTURc1sXy6oi4JVu+PCKm5S27PCtfHRFn99RmRJwZEY9HxJMR8auImHVwuyhJkiRJR68ek76IKAeuBc4B5gIXRMTcTtUuBnallGYB1wBXZ+vOBZYAJwKLgOsioryHNq8HPpZSegvwfeD/HNwuSpIkSdLRq5AzffOBNSmldSmlJmApsLhTncXATdn0bcCZERFZ+dKUUmNKaT2wJmvvQG0mYHg2PQLY0rddkyRJkiRVFFBnErAxb34TsKC7OimllojYDYzJyh/qtO6kbLq7Nj8F3BERe4FXgdMLiFGSJEmS1IVCkr7+9sfA+1NKyyPiz4B/IpcI7iciLgEuAZgwYQK1tbX9GqQOTl1dHYPqV/Xb9mpr1/XbtnT41NXV+VlXr9lv1Bf2G/WF/UZ90R/9ppCkbzMwJW9+clbWVZ1NEVFB7rLMHT2s+7ryiBgHvDmltDwrvwW4s6ugUko3ADcAzJs3Ly1cuLCAXdGRora2li2DZ/Tb9hYumNpv29LhU1tbi5919Zb9Rn1hv1Ff2G/UF/3Rbwq5p+8RYHZETI+IKnIDsyzrVGcZcFE2fR5wT0opZeVLstE9pwOzgYcP0OYuYEREHJ+1dRbwTN93T5IkSZKObj2e6cvu0fsccBdQDtyYUloREVcCj6aUlgHfBL4bEWuAneSSOLJ6twIrgRbg0pRSK0BXbWblvw/8MCLayCWBv3dI91iSJEmSjiIF3dOXUroDuKNT2RV50w3A+d2sexVwVSFtZuW3A7cXEpckSZIk6cAKeji7JEmSJGlgMumTJEmSpBJm0idJkiRJJcykT5IkSZJKmEmfJEmSJJUwkz5JkiRJKmEmfZIkSZJUwkz6JEmSJKmEmfRJkiRJUgkz6ZMkSZKkEmbSJ0mSJEklzKRPkiRJkkqYSZ8kSZIklTCTPkmSJEkqYSZ9kiRJklTCTPokSZIkqYRVFDsA6WDtbWpl+fodbNndwLQxNcwcN5Txw6qJiGKHJkmSJBWdSZ8GrFf3NnP/mu0sf34nTS1tDB9UwdObdwMwfFAFM8cNZdb4ocwcP7TIkUqSJEnFY9KnAWfba4388rltPLHxFdraEm+aPIJ3Hj+OiSMGs7O+ibXb6ljzch2rX3qNJza+AsCPHt/ERWdM48L5Uz0DKEmSpKOKSZ8GjE279nDvs9tYueVVysuC06aN4u2zxjF6SFVHndFDqhg9ZDSnTRtNW0ps3d3AmpfreOm1Bv7i9qdZ9eJr/OX/N5eKcm9nlSRJ0tHBpE8DwlObXuGWRzYyqLKMd50wjjNmjmVo9YG7b1kEx44czLEjB7PktClcfdcq/v3edTy/o56vXXgKIwZX9lP0kiRJUvF4ukNHvM279vLDxzYxbUwNXzp7Du+be0yPCV9nZWXB5ee8gat/6yQeXLuD37r+ATbs2HOYIpYkSZKOHAUlfRGxKCJWR8SaiLisi+XVEXFLtnx5REzLW3Z5Vr46Is7uqc3IuSoino2IZyLi8we3ixrIXmto5rsPPc/Q6gouXHAcgyrLD6q9j542le9evIDtdY0svvZXPPL8zkMUqSRJknRk6jHpi4hy4FrgHGAucEFEzO1U7WJgV0ppFnANcHW27lxgCXAisAi4LiLKe2jzk8AUYE5K6Q3A0oPaQw1YLa1t3Lx8A3ubW/md04/r9dm97rx15hhu/+zbGFVTxcf+Yzk/fGzTIWlXkiRJOhIVcqZvPrAmpbQupdRELglb3KnOYuCmbPo24MzIDZG4GFiaUmpMKa0H1mTtHajNzwBXppTaAFJKL/d99zRQpZT4rye3sGHnHs47dQrHjhx8SNufPnYIP/rsGZx63Cj+5D+f4qt3raKtLR3SbUiSJElHgkKSvknAxrz5TVlZl3VSSi3AbmDMAdY9UJszgY9GxKMR8ZOImF3YrqiUPLB2B49v2MV75oznpEkjDss2RtZU8Z2L53PB/Clc+4u1XH3nqsOyHUmSJKmYjsTRO6uBhpTSvIj4CHAj8I7OlSLiEuASgAkTJlBbW9uvQerg1NXVMai+6yRr9a427ni6lZPGBO8fu5Oy7bsOenu1teu6Xfa+UYmXplTw7/eto2XnJt4x2VE9j1R1dXV+1tVr9hv1hf1GfWG/UV/0R78pJOnbTO4eu3aTs7Ku6myKiApgBLCjh3W7K98E/Cibvh34VldBpZRuAG4AmDdvXlq4cGEBu6IjRW1tLVsGz3hd+fa6Rm5avYbxw6v5yBkzaao4uIFb2i1cMPWAy9/+zjY++a2H+c4zO1n09lM5bdroQ7JdHVq1tbX4WVdv2W/UF/Yb9YX9Rn3RH/2mkMs7HwFmR8T0iKgiNzDLsk51lgEXZdPnAfeklFJWviQb3XM6MBt4uIc2/wt4dzb9LuDZvu2aBpqG5la++9ALlEXw8dOnUX2IEr5CVJaXcd2FpzJlVA2f/u5jbNzp4xwkSZJUGnpM+rJ79D4H3AU8A9yaUloREVdGxAezat8ExkTEGuCLwGXZuiuAW4GVwJ3ApSml1u7azNr6CvBbEfEb4O+BTx2aXdWR7rbHNrGjrpEL5k9l9JCqft/+iJpKvnHRPFpa2/jUTY/yWkNzv8cgSZIkHWoF3dOXUroDuKNT2RV50w3A+d2sexVwVSFtZuWvAB8oJC6Vjudeeo2VL77KohOPYea4oUWLY8a4oVz/O6fyiRsf5gtLn+Q/PjGP8rIoWjySJEnSwToSB3LRUaYtJX7y9FZGD6nijFljDss2vr98Q6/qn/umifz4yS185SfP8Bcf6PxYSkmSJGngMOlT0T254RW2vtrAktOmUFFWyG2mh9+C6WN4+dVG/uOX69lR18S8Agd2ubCHAWMkSZKk/nZkfMPWUau5tY27n3mJyaMGH7bn8fXV+0+ayOzxQ/nxk1tYv72+2OFIkiRJfWLSp6J6YM12du9tZtEbjyHiyLp3rrwsWHLaVEYNqeLm5S/wyp6mYockSZIk9ZpJn4qmrrGF2me3MeeYYcwYW7zBWw5kcFU5nzj9OFraEksf2UhrWyp2SJIkSVKvmPSpaH6x+mWaWtpYdOIxxQ7lgMYOq+bDJ09iw8493L1ya7HDkSRJknrFpE9F8VJ9G8vX7WDetNGMHz6o2OH06M2TRzJ/+mjue247q7a+WuxwJEmSpIKZ9KkobnuuiYqyMs58w/hih1KwD5w0kYkjBvGfj27y/j5JkiQNGCZ96ndPbNjFI1tbefvssQwfVFnscApWWV7GBadNpTV5f58kSZIGDpM+9auUEn93xzMMrwreMWtsscPptbHDqvnwW7y/T5IkSQOHSZ/61d0rX+KR53fxoVmVVFeWFzucPnnzlJHMn+b9fZIkSRoYTPrUb5pb2/jKnauYMW4I75pcUexwDsoH3uT9fZIkSRoYTPrUb259dCPrttVz2aI5lJcdWQ9i7y3v75MkSdJAYdKnftHc2sZ1v1jLyVNHctbcCcUO55DIv7/vp97fJ0mSpCPUwL7GTgPG//76RTa/spe//uCJRAzss3z53jxlJOu31/PL57YzbcyQYocjSZIkvY5n+nTYpZS4vnYts8cP5T1zBs5z+Qr1gTdN5NgRg/jPxzayceeeYocjSZIk7cekT4fdL1a/zOqXXuMP3jWTsgF+L19XKsvLuHDBcQB85ubHaGhuLXJEkiRJ0j4mfTrsvl67jmNHDOKDbzm22KEcNqOHVHHeKVN4evOr/M3/rCx2OJIkSVIHkz4dVo8+v5OHn9/J779zBpXlpd3d5h47nE+/cwY3L9/Afz2xudjhSJIkSYBJnw6zr9+7llE1lXz0tCnFDqVf/OnZJzB/2mgu/9FveO6l14odjiRJkmTSp8Pn2Zde42fPvMxFZ0yjpuroGCi2sryMf7vwZIZUl/OZmx+nvrGl2CFJkiTpKGfSp8Pm6/euZXBlORe9dVqxQ+lXE4YP4l+WnMy6bXVc/qPfkJIPbpckSVLxFJT0RcSiiFgdEWsi4rIulldHxC3Z8uURMS1v2eVZ+eqIOLsXbf5rRNT1bbdUbJt27WHZk1u4YP5URg2pKnY4/e5ts8byxbOOZ9lTW/je8g3FDkeSJElHsR6TvogoB64FzgHmAhdExNxO1S4GdqWUZgHXAFdn684FlgAnAouA6yKivKc2I2IeMOog901F9I1frgfgU++YXuRIiuezC2ex8IRx/M1/r+TJja8UOxxJkiQdpQo50zcfWJNSWpdSagKWAos71VkM3JRN3wacGRGRlS9NKTWmlNYDa7L2um0zSwi/Cnzp4HZNxbKzvolbHtnIh06exLEjBxc7nKIpKwuu+e23MH54NZd851Fe3L232CFJkiTpKFTI6BqTgI1585uABd3VSSm1RMRuYExW/lCndSdl0921+TlgWUrpxVze2LWIuAS4BGDChAnU1tYWsCvqD7c/18Te5lZOGbyj279LXV0dg+pX9W9g/aC2dt3ryv5gLvztQ4189Gu1/PmCQQyqKL0H1PeXuro6P+vqNfuN+sJ+o76w36gv+qPfHFFDKkbEscD5wMKe6qaUbgBuAJg3b15auLDHVdQP6htb+KP77uGsuRO48Nx53darra1ly+AZ/RhZ/1i4YGqX5ZOOf5mLv/0IP9wyjH//nVMpKzPx64va2lr8rKu37DfqC/uN+sJ+o77oj35TyOWdm4H8h6xNzsq6rBMRFcAIYMcB1u2u/GRgFrAmIp4HaiJiTYH7oiPA0kc28sqeZj6zcGaxQzmivPuE8fzfc+dy98qXuPqu0jvDKUmSpCNXIUnfI8DsiJgeEVXkBmZZ1qnOMuCibPo84J6UG6d+GbAkG91zOjAbeLi7NlNK/5tSOialNC2lNA3Ykw0OowGgqaWNb/5yHQumj+aUqY7D09knz5jGxxZM5d/vXcetj27seQVJkiTpEOjx8s7sHr3PAXcB5cCNKaUVEXEl8GhKaRnwTeC72Vm5neSSOLJ6twIrgRbg0pRSK0BXbR763VN/+vGTm9myu4GrPnJSsUM5IkUEf/XBE3lhxx7+4vbfMHV0DafPGFPssCRJklTiCrqnL6V0B3BHp7Ir8qYbyN2L19W6VwFXFdJmF3WGFhKfiq+tLfH1e9fyhonDWXj8uGKHc8SqLC/j2o+dwoevu58/+N5j/Ndn38a0sUOKHZYkSZJKWEEPZ5d68tOVL7F2Wz2fWTiTA426KhgxuJIbLzoNgItveoTde5uLHJEkSZJKmUmfDlpKievvXctxY2p4/xuPKXY4A8K0sUP4+u+cyoade7j05sdpamkrdkiSJEkqUSZ9OmgPrt3BUxtf4dPvnElFuV2qUKfPGMNVHz6JX63Zzud/8ATNrSZ+kiRJOvT8hq6Ddl3tWsYNq+Yjp0wqdigDzm/Pm8IV587lzhVb+eKtT9Fi4idJkqRD7Ih6OLsGnl9veoVfrdnO5efMYVBlebHDGZB+7+3TaW5t4+9/sorKsuCr57+Zch/eLkmSpEPEpE8H5fratQwfVMGFC6YWO5QB7dPvmklTSxv/ePezVJaX8fcfOYkyEz9JkiQdAiZ96rM1L9dx54qtXLpwFsMGVRY7nAHvD8+cTVNrG/92zxoqK4K/WfxGR0KVJEnSQTPpU5/dcN9aqivK+N23TSt2KCXji2cdT1NLG/9+3zqqysv5v+e+wcRPkiRJB8WkT33y4u693P7EZi6cP5UxQ6uLHU7JiAguO2cOTa1t3Hj/eqoqyvjyohNM/CRJktRnJn3qk/+4bz0pwe+/c0axQyk5EcEV586lqaWNr9+7lrKAPzvbxE+SJEl9Y9KnXttV38QPHt7AB99yLJNH1RQ7nJIUkbunry0lrqtdy9bdDXzlt95EVYVPWZEkSVLvmPSp1779wPPsbW7lD941s9ihlLSysuDvPnwSE0cM5p/ufpYXdzfw9d85lRE1DpojSZKkwpn0qVfqG1u46cHnOWvuBI6fMKzY4Rxxvr98Q5/W6+6RFxHB58+czeRRg/nyD3/Nb339Ab71ydOYMtozrJIkSSqM14qpV37w8AZe2dPMZxZ6lq8/feSUyXzn9xbw8qsNfPi6B3hq4yvFDkmSJEkDhEmfCtbQ3Mo3frme02eM5pSpo4odzlHnrTPH8KPPnsGgyjI+esOD/HTF1mKHJEmSpAHAyztVsG/d/zxbX23gn5e8pdihlJzeXBb68dOP47sPvcCnv/cYV5w7l9992/TDGJkkSZIGOs/0qSC76pu4rnYNZ84Zz+kzxhQ7nKPasEGVfOrtMzjrDRP46/9eyZdue4o9TS3FDkuSJElHKJM+FeRrv1hDfWMLXz5nTrFDEVBVUcb1v3Mql757Jv/52CbO/bdfsWLL7mKHJUmSpCOQSZ96tHHnHr7z4POcf+oUR+w8gpSXBX929hxuvngBdQ0tfPjaB/jW/etJKRU7NEmSJB1BTPrUo3/46WrKy4I/Puv4YoeiLpwxayx3/tE7ecfssfz1f6/k4pseZUddY7HDkiRJ0hHCgVx0QL/ZtJsfP7mFS989k2NGDCp2OMrTefCX98wZT01VOT95eisL/6GW80+dwqzxQ1+3XnfPBJQkSVJp8kyfupVS4u9/8gyjh1Tx6Xf5XL4jXUTw1plj+czCmQyqKOdb96/nzqe30tzaVuzQJEmSVEQFJX0RsSgiVkfEmoi4rIvl1RFxS7Z8eURMy1t2eVa+OiLO7qnNiLg5K386Im6MiMqD20X11b3PbuOBtTv4w/fMYvgg/wwDxcQRg7n03bOYN20U9z23jX/9+XM89/JrxQ5LkiRJRdJj0hcR5cC1wDnAXOCCiJjbqdrFwK6U0izgGuDqbN25wBLgRGARcF1ElPfQ5s3AHOAkYDDwqYPaQ/VJa1viKz9ZxdTRNXxswXHFDke9VFVRxodPnszvnjENyD1j8QcPb+DVvc3FDUySJEn9rpB7+uYDa1JK6wAiYimwGFiZV2cx8FfZ9G3A1yIisvKlKaVGYH1ErMnao7s2U0p3tDcaEQ8Dk/u4bzoItz+xmVVbX+PfLjiZqgqvAh6oZk8YxufPnM19z27j3me38exLr1FVUcYn3nocFeX+XSVJko4GhSR9k4CNefObgAXd1UkptUTEbmBMVv5Qp3UnZdMHbDO7rPPjwBe6CioiLgEuAZgwYQK1tbUF7IoK0dSa+Ltf7mX6iDKG7FxNbe2zh3wbdXV1DKpfdcjb1esNAj4wDuYPLedHa1u58n9W8u17V/GJE6uYNbK82OH1Sl1dnZ919Zr9Rn1hv1Ff2G/UF/3Rb47k0TuvA+5LKf2yq4UppRuAGwDmzZuXFi5c2I+hlbav37uWnQ2ruPbjC3jrzDGHZRu1tbVsGTzjsLStrg0DPjE5MbKmiiv/ZwVXLW/gt0+dwhfeO5tjRw4udngFqa2txc+6est+o76w36gv7Dfqi/7oN4UkfZuBKXnzk7OyrupsiogKYASwo4d1u20zIv4SGAd8uoD4dAjtqm/i2l+s4T1zxh+2hE/FExF84E0TeefxY/nnnz3Hdx58ntuf3MzHTz+Ozy6cyZih1cUOUZIkSYdYITf1PALMjojpEVFFbmCWZZ3qLD4mUkwAABdjSURBVAMuyqbPA+5JKaWsfEk2uud0YDbw8IHajIhPAWcDF6SUHGu+n13zs2epb2zhy4vmFDsUHUbDBlXyf8+dyz1/spDFbz6Wb92/nnf+v1/wTz9dzasNDvYiSZJUSnpM+lJKLcDngLuAZ4BbU0orIuLKiPhgVu2bwJhsoJYvApdl664AbiU36MudwKUppdbu2sza+jowAXgwIp6MiCsO0b6qB7WrX+Y7D77AJ946jROOGVbscNQPpoyu4avnv5mf/vG7WHjCeP71njW84+pfcH3tWvY2tRY7PEmSJB0CBd3Tl42oeUensivyphuA87tZ9yrgqkLazMqP5PsMS9b2ukb+9D9/zQkThnHZOZ7lO9rMGj+Uaz92Cp/ZvJt//Olqrr5zFTfev55L3jGDj86f4nMaJUmSBjATLJFS4su3/ZpXG5r53qfmM6hyYI3oqEPnjZNG8K3fnc+jz+/kH3/6LFfd8Qz/8vPneMuUkZwxcwwja6p61d6FC6YepkglSZJUKJM+8d2HXuDnq17minPnMueY4cUOR0eAedNG84NLTuc3m3bzjV+t47+f2sIDa7dz4rEjeMfssUweVVPsECVJklQgk76j3LMvvcZV//sM7zp+HL/7tmnFDkdHmJMmj+BflpzMCROG8eDaHTz8/E5+s3k308bU8PZZ4zjhmGGUl0Wxw5QkSdIBmPQdxRqaW/n8D55gaHUF/3D+m4nwy7u6NrKminNOmsi754zn0Rd28cCa7Xxv+QsMH1TByVNHcepxoxjr4x4kSZKOSCZ9R7H/d+dqVm19jRs/OY9xw/zCrp4Nqizn7bPG8tYZY1i19VUee2EX9z27jXuf3cZxY2qYd9wo3jhpBNUV3hcqSZJ0pDDpO0rd++w2brx/PRe99TjeM2dCscPRAFNeFpx47AhOPHYEr+5t5omNr/DYCzv54eOb+e+nXuSkySM4ZeooWtuSl39KkiQVmUnfUWhHXSN/+p9PcfyEoVz+/jcUOxwNcMMHV/Ku48fxztlj2bBzD4+9sItfb97NYy/s4vYnNnHW3Am878RjOGPmGM8ASpIkFYFJ31EmpcSXbvs1u/c2853f8/EMOnQiguPGDOG4MUP4wJsmsnrra9Q3tfLfT73IDx7eyLDqCt49Zzxnn3gMC08Yx5BqDz+SJEn9wW9dR5GUEl/5yaqOxzO8YaKPZzgafX/5hsO+jeqKct40eSQXLphKY0srD6zZwZ1Pb+XuZ15i2VNbqKooY8H00Zw+YwynzxjDmyaPoLK87LDHJUmSdDQy6TtKtLUlrlj2NN97aAMfP/04H8+gflNdUc6754zn3XPG83dtiUef38ldK17igbXb+epdqwGoqSpn3rTRvHXGGE6fMZqTJo2gwiRQkiTpkDDpOwq0tLbxpR/+mh89vplPv2sGly2a4+MZVBTlZcGCGWNYMGMMkLu/dPn6nTy0bgcPrt3B1XeuAmBIVTlzjx3O3InDs9cRzJ4wtJihS5IkDVgmfSWuqaWNLyx9gp88vZU/Oet4PveeWSZ8OmKMGVrN+0+ayPtPmgjAttcaWb5+B4+s38mKLa9y22ObqH+wFcgljBNr4LSXnuT4CcM4bkwNU0bVMGX0YEYMrrRfS5IkdcOkr4Q1NLfyB997jNrV2/g/H3gDn3rHjGKHpKNMX+4fvHDBVM5907FA7rLkDTv3sPLFV1m55VV++fR6Hly7g9uf2LzfOsMGVXQkgFNH1zBh+CDGDq1mzNAqxgypZuzQKkYNqfK+QUmSdFQy6StRdY0tfOqmR1i+fid//5GTuGD+1GKHJPVaWVkwbewQpo0dwvtPmsi86hdZuHAhrzY0s3HnHjbu3Jt73bWHjTv3sHZbPbWrt9HY0tZleyNrKhk9pIrhgyoZNqgi96+6fbqSoVnZkKoKaqrKqakqZ0h1xX6vNVUVPntQkiQNKCZ9JWj3nmYu+tbD/Gbzbq757bfwoZMnFTsk6ZAaPqiy4+HwkDujOOeY3Gi0KSUamtuoa2yhrrGF+sYW6pv2Tdc1tlLf2MKOukYamttoaGmlobmV5tZU8PYry4OqinKqK8qoriijqryM6src66DK8uxfNl2Rm//AmyYyfHAFI2uqGFVTyeDK8h4vSe3rmVJJkqR8Jn0l5rEXdvHnP/oN67fXc+2Fp7DojccUOySpX0UEg6vKGVxVzrhh1QWv19qWaGxppaG5jabWNppa8v61ttKYN9/Y8draMb+nqZVdzc3dJpHfW/7CfvMVZbk4288etr8Orc6dVRxSXcHQ6txZxyHVnmGUJEl9Z9JXIrbubuDqO1dx+xObmTC8mm9+ch7vmD2u2GFJA0Z5WWTJ16Fpr7Ut0djcyt7mVhpa2mhobmVvU25+T1Mre5ta2NPUmv1rYdtrjdQ37WFPYwtdnXMMco+2GDoolwwOG1TJ0Cwx3FdWwcuvNTC6pspHXkiSpA4mfQNcQ3Mr/3HfOq6rXUtrSnzu3bP4zMKZDKn2T6uB6UCXNA6qb+qXh8sfCuVlQU11BTW9/Cy2pcTepta8S1Nz03UNufnXGluoa2jmhR311DW2vO6M4r/ds4YIGF1Txdih1YwdVsW4odWM6RjYporRQ/KnqxhaXeHop5IklTAzgwEqpcRPnt7KVf/7DJtf2cs5bzyGP3//G5gyuqbYoUk6CGURHZd39iSlRFNL7v7F1xpy9y2+YeIwttU1sb2uke2vNbK9rpHHNuxiR10Te5pau2ynqqKMUTWVjKqpYmTHa9V+ZSMG5/4Nz3sdUtXzfYmSJKn4TPoGmNa2xCPP7+Sau59l+fqdzDlmGN///QWcMXNssUOT1M8igurKcqoryxkzNHf/4oEGctnb1MqO+kZ21jexo76JHXVN7KxvZEddE7v2NLFrTzOv7GniuZfreCWbb23rfoCb8rJg+KAKhg+u7BgJtX0E1OHZ5afDBu1/+enQ6tdfllpV4aWokiQdTiZ9A0BDcyv3r9nOT1e8xM+eeYkd9U2Mqqnkbz/0RpacNsV7dyQVZHBVOZOrapg86vVXBHR12WxKqWOQmr3ZPYkN7fcp5t2j2NCcGwBnS8NeGrfn7l9saGmlsbmty/sTO6sqL2NINoDN0OwsZ266PBvIZt9jM4ZUlVPTaYCbmmzgnvxBcXwmoyRJ+xSU9EXEIuBfgHLgGymlr3RaXg18BzgV2AF8NKX0fLbscuBioBX4fErprgO1GRHTgaXAGOAx4OMppaaD282BZ/feZn6x6mV+unIrtau3saeplaHVFbx7znjOPnECC08Yz1Dv25N0GEVExyMo+qL98tOGbKTTxubcSKfzp4/O7lNszh6tkXuMRn37YzaaWti9t5ktr+ylvjE34E19YwstBzjr2FlleTA4i7uKFioe+DmV5UFFee7RGhXlQWV5GRVlQXlZbrq8LKgoC+ZNG01VRRlVWf3c8qCirKzjtaI8KIvcumURlEXuzGdZ3nwQRJD7lzddFkH7RbG5q2OzZdl7HmR1gqy9vPkIbn98c24bZWTbyrbXw6W2Ps5Dko5ePWYNEVEOXAucBWwCHomIZSmllXnVLgZ2pZRmRcQS4GrgoxExF1gCnAgcC/wsIo7P1umuzauBa1JKSyPi61nb1x+KnT0SvbKnibXb6lm3ra7jdd32etZvr6e1LTFuWDUfOnkSZ594DKfPGE11Rd++fEk6OhxJA93kX34KlR3lO+tzv+NVVZQzuqKc0UN6biullD1WI3tcRmsbTc2tNLUmmlpas8dspP0ft9HaRktrG617XqGxoorm1jaaW9uyAXBy061tiebWXNstbW20Jfj5qpcP0ztyeAW5JLE8SwjLy8ooz5LR8rLgn3/2bEeSW15WljcdHQlv/nRZWfDmySOpLC+jsiJyyXJZUJ6XLO97LetYNxfH/glse9KaL3XK4X++6iVIdJwdTinlpvPK9l8vve5McmT/zd9WLpneF09E8N654/dLmNvft/z3Y0tdG89vr+9I5ivK9iX6+71PHcl/z4m3VAw+81VQ2Jm++cCalNI6gIhYCiwG8pO+xcBfZdO3AV+L3JFvMbA0pdQIrI+INVl7dNVmRDwDvAe4MKtzU9bugEv6VmzZzdbdDbyyp5nde1//b9eeJjbs2MOO+n0nMSvLg6mja5g5biiLTjyGd88Zz8lTRlLms7kkHeUigorszNuQwh+/CMCg7XU0jD2uoLptKfFbp0zelzS2tdHSmmhpS7S0ttHcmksOm1tTRyLaluDulS91JCltKXUkJu1lHfNZWbszZo3tmE8pl8aklJtub6ct5baRm08sX79zX3lborWjTqKtLTfd2pZobY+vLRd/R3k239TaRmvzvoS3vbw179/9a7bTixOsA8aN968vrOKvanvVbgRZ0r0viWxpa9t3ppcs+cybBjrO/JKVDx1Ukc0e+P//+f2loyyvz3Us71iWrZX274v79dHO2XjHvu2LpamlrWN/2+NvT6qJ/efzE+4Rgys7zliXlwWRd5a8YzrvR4J99diXoMe+uu1nvCM/gd8vjvZkPJsu2/dut/8N9t+P1/84sd/7nfee7v++7fs7bHmxkf/Z9lS3n9/2Y0Zb9hntmG6f7/Ta2pZb3vEZT/s+4ymRtZeyz+m+v2379lPKfdbb92/f33Pf3454/RUF//HLdZQFHT/oVJTnfuSoyH70qSgvo7Js33G5/Uehyor9pyuzKykqK3JlVRVlHVdNVGZXXbTPt191kb+dzj9Etf/QUlGe+4xFpx9d2qfb92VfH9i/Px6NCkn6JgEb8+Y3AQu6q5NSaomI3eQuz5wEPNRp3UnZdFdtjgFeSSm1dFF/QPnrZSt5+Pmd+5UNG1TRMQLeyJpKzpo7gRnjhjBz3FBmjBvKlFGDvT9PkoqoLILbn9jc6/VmjR/ap+3VNbT0XClTll0f2p8Dd124YGp2NrQt+5efOO6fKDa3tnWZrMK+L7adv2zlz9694qV9X0Rfdwls3hfWvC/r+cvzv4jTMZ2fUOe+ALd1k1ynvC/UbSlR8eoWGodOzNVPiXnTRtOa7XNrIpvOvbblffne78t7Gzyz9dXsbOW+be4/3W5fMpG3Cz3qfKlwx39fl9QcaNl+7/D+b3AXwbSnip3f8/2SzLz3fP/XfdPt71tLayKx7/1vX7ctJcYOre7oS+1J0+v72b7p1my6rrFlvxig048vHTuzL/EtVHui1D49uKq8I6FobmplXf2O/RLVjrPeHWfig917mjp+ACjr4rX9EvH2pLgyS3DyLxHv/APC/kl33t809v092t+B/X8o2P/9TMCU0TXZD0b7PuctrbnPeVNLG/VNrbS05n4Ua85+HMs/TjRny9oTziNNVz+8tCeJeYea163T7v0nTeSffvst/RHqITNgbwqLiEuAS7LZuohYXcx41Gtjge3FDkIDjv1GfWG/6aOPFTuA4rLfqC/sN0eB1cA1Hz2kTR6qftPtZS2FJH2bgSl585Ozsq7qbIqICmAEuQFdDrRuV+U7gJERUZGd7etqWwCklG4Abiggfh2BIuLRlNK8YsehgcV+o76w36gv7DfqC/uN+qI/+k0h1xI+AsyOiOkRUUVuYJZlneosAy7Kps8D7km5c+nLgCURUZ2NyjkbeLi7NrN1fpG1Qdbmj/u+e5IkSZJ0dOvxTF92j97ngLvIPV7hxpTSioi4Eng0pbQM+Cbw3Wyglp3kkjiyereSG/SlBbg0pdQK0FWb2Sa/DCyNiL8FnsjaliRJkiT1QXQ3SpN0OEXEJdklulLB7DfqC/uN+sJ+o76w36gv+qPfmPRJkiRJUgnz+QCSJEmSVMJM+tTvImJRRKyOiDURcVmx41HxRMSUiPhFRKyMiBUR8YWsfHRE3B0Rz2Wvo7LyiIh/zfrOryPilLy2LsrqPxcRF3W3TZWOiCiPiCci4n+y+ekRsTzrH7dkA4WRDSZ2S1a+PCKm5bVxeVa+OiLOLs6eqL9ExMiIuC0iVkXEMxHxVo836klE/HH2/6inI+IHETHI4406i4gbI+LliHg6r+yQHV8i4tSI+E22zr9G9O4p8yZ96lcRUQ5cC5wDzAUuiIi5xY1KRdQC/ElKaS5wOnBp1h8uA36eUpoN/Dybh1y/mZ39uwS4HnIHVeAvgQXAfOAv2w+sKmlfAJ7Jm78auCalNAvYBVyclV8M7MrKr8nqkfW1JcCJwCLguuwYpdL1L8CdKaU5wJvJ9R+PN+pWREwCPg/MSym9kdwAhEvweKPX+za5v22+Q3l8uR74/bz1Om/rgEz61N/mA2tSSutSSk3AUmBxkWNSkaSUXkwpPZ5Nv0buC9gkcn3ipqzaTcCHsunFwHdSzkPknus5ETgbuDultDOltAu4m14eDDWwRMRk4APAN7L5AN4D3JZV6dxv2vvTbcCZWf3FwNKUUmNKaT2whtwxSiUoIkYA7yQbFTyl1JRSegWPN+pZBTA4cs+irgFexOONOkkp3UfuKQb5DsnxJVs2PKX0UPaIu+/ktVUQkz71t0nAxrz5TVmZjnLZJTAnA8uBCSmlF7NFW4EJ2XR3/cd+dfT5Z+BLQFs2PwZ4JaXUks3n94GO/pEt353Vt98cXaYD24BvZZcFfyMihuDxRgeQUtoM/AOwgVyytxt4DI83KsyhOr5MyqY7lxfMpE9S0UXEUOCHwB+llF7NX5b9ouUww+oQEecCL6eUHit2LBpQKoBTgOtTSicD9ey71ArweKPXyy6tW0zuR4NjgSF4Zld9UOzji0mf+ttmYEre/OSsTEepiKgkl/DdnFL6UVb8UnYpA9nry1l5d/3HfnV0eRvwwYh4ntwl4u8hd6/WyOzyK9i/D3T0j2z5CGAH9pujzSZgU0ppeTZ/G7kk0OONDuS9wPqU0raUUjPwI3LHII83KsShOr5szqY7lxfMpE/97RFgdjbqVRW5m5qXFTkmFUl2n8M3gWdSSv+Ut2gZ0D5i1UXAj/PKP5GNenU6sDu7bOIu4H0RMSr7VfZ9WZlKUErp8pTS5JTSNHLHkHtSSh8DfgGcl1Xr3G/a+9N5Wf2UlS/JRtubTu7G+If7aTfUz1JKW4GNEXFCVnQmsBKPNzqwDcDpEVGT/T+rvd94vFEhDsnxJVv2akScnvXDT+S1VZCKnqtIh05KqSUiPkeuU5cDN6aUVhQ5LBXP24CPA7+JiCezsj8HvgLcGhEXAy8Av50tuwN4P7kb4PcAvwuQUtoZEX9D7kcFgCtTSp1vplbp+zKwNCL+FniCbMCO7PW7EbGG3E32SwBSSisi4lZyX+BagEtTSq39H7b60R8CN2c/Oq4jdwwpw+ONupFSWh4RtwGPkztOPAHcAPwvHm+UJyJ+ACwExkbEJnKjcB7K7zOfJTdC6GDgJ9m/wuPL/fggSZIkSSpFXt4pSZIkSSXMpE+SJEmSSphJnyRJkiSVMJM+SZIkSSphJn2SJEmSVMJM+iRJOggRcWVEvLfYcUiS1B0f2SBJUh9FRLnP2pIkHek80ydJUhciYlpErIqImyPimYi4LSJqIuL5iLg6Ih4Hzo+Ib0fEedk6p0XEAxHxVEQ8HBHDIqI8Ir4aEY9ExK8j4tNF3jVJ0lHGpE+SpO6dAFyXUnoD8Crw2ax8R0rplJTS0vaKEVEF3AJ8IaX0ZuC9wF7gYmB3Suk04DTg9yNien/uhCTp6GbSJ0lS9zamlO7Ppr8HvD2bvqWLuicAL6aUHgFIKb2aUmoB3gd8IiKeBJYDY4DZhzdsSZL2qSh2AJIkHcE63/jePl/fizYC+MOU0l2HJiRJknrHM32SJHVvakS8NZu+EPjVAequBiZGxGkA2f18FcBdwGciojIrPz4ihhzOoCVJymfSJ0lS91YDl0bEM8Ao4PruKqaUmoCPAv8WEU8BdwODgG8AK4HHI+Jp4N/xShtJUj/ykQ2SJHUhIqYB/5NSemORQ5Ek6aB4pk+SJEmSSphn+iRJkiSphHmmT5IkSZJKmEmfJEmSJJUwkz5JkiRJKmEmfZIkSZJUwkz6JEmSJKmEmfRJkiRJUgn7/wHNctr9aAF8MwAAAABJRU5ErkJggg==\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "That's a pretty long tail!\n\nBucket into **10** equal sized buckets"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "num_bins = 10",
"execution_count": 5,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "bucketed_price, bins = pd.qcut(df.price, q=num_bins, labels=False, retbins=True)",
"execution_count": 6,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Boundaries of each bin identified by `qcut`"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.DataFrame(np.round(bins), columns=[\"Upper Bound\"]).rename_axis(index='Bin')",
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 7,
"data": {
"text/plain": " Upper Bound\nBin \n0 5.0\n1 50.0\n2 144.0\n3 233.0\n4 340.0\n5 483.0\n6 685.0\n7 952.0\n8 1373.0\n9 2490.0\n10 8708.0",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>Upper Bound</th>\n </tr>\n <tr>\n <th>Bin</th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>5.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>50.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>144.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>233.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>340.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>483.0</td>\n </tr>\n <tr>\n <th>6</th>\n <td>685.0</td>\n </tr>\n <tr>\n <th>7</th>\n <td>952.0</td>\n </tr>\n <tr>\n <th>8</th>\n <td>1373.0</td>\n </tr>\n <tr>\n <th>9</th>\n <td>2490.0</td>\n </tr>\n <tr>\n <th>10</th>\n <td>8708.0</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df[\"price_bucketed\"] = bucketed_price\ndf.head()",
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 8,
"data": {
"text/plain": " price price_bucketed\n0 1833 8\n1 296 3\n2 199 2\n3 4936 9\n4 1595 8",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>price</th>\n <th>price_bucketed</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1833</td>\n <td>8</td>\n </tr>\n <tr>\n <th>1</th>\n <td>296</td>\n <td>3</td>\n </tr>\n <tr>\n <th>2</th>\n <td>199</td>\n <td>2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>4936</td>\n <td>9</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1595</td>\n <td>8</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Number of examples in each bucket are now $\\approx$ equal :)"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "df.price_bucketed.value_counts()",
"execution_count": 9,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 9,
"data": {
"text/plain": "0 100\n9 99\n8 99\n7 99\n6 99\n4 99\n3 99\n2 99\n5 98\n1 98\nName: price_bucketed, dtype: int64"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Plot the distribution of `price_bucketed`"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "plt.figure(figsize=(15, 4))\nax = sns.distplot( df.price_bucketed, bins=num_bins, kde=False )\nplt.ylabel(\"count\")\nax.grid()",
"execution_count": 10,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 1080x288 with 1 Axes>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA3sAAAEHCAYAAAAXsl9wAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAVmElEQVR4nO3de4xmB3ke8Oe1F3exuZksnRovzrrFxbVoItBiG2jRFKMWCMUOItxCcKiljVJiLkmVOPmjVJEqgZomcSCiWnEzCeUSY2oXR2BimEShqfGF+3otrAV8yRpjgg24cZylb/+YY3dYdpaZ3Zk535z9/aTRfud85/vO4/G7l2fO5avuDgAAANNy3NgBAAAAWHvKHgAAwAQpewAAABOk7AEAAEyQsgcAADBBW8YOcDS2bdvWO3bsGDvGj7j//vtz0kknjR0DlmVGmXVmlFlnRpl1ZvTYceONN97T3U841HObuuzt2LEjN9xww9gxfsTCwkLm5+fHjgHLMqPMOjPKrDOjzDozeuyoqm8s95zTOAEAACZI2QMAAJggZQ8AAGCClD0AAIAJUvYAAAAmSNkDAACYoHUre1X17qq6u6q+vGTd46vqk1X11eHXk4f1VVV/UFW3VtUXq+rp65ULAADgWLCeR/bem+T5B627JMm13X1GkmuH5SR5QZIzhq9dSd6xjrkAAAAmb93KXnf/RZK/OWj1+UkuGx5fluSCJevf14v+d5LHVdUp65UNAABg6rZs8P7munv/8PiuJHPD41OT3L5kuzuGdftzkKralcWjf5mbm8vCwsK6hT1S99733Vxx9TVjx9g0Hn/SCWNHOOZ8//vfn8nfO/AQM8qsM6PMOjNKsvFl72Hd3VXVR/C63Ul2J8nOnTt7fn5+raMdtSuuviYPbDtz7Bibxvw5p40d4ZizsLCQWfy9Aw8xo8w6M8qsM6MkG1/2vllVp3T3/uE0zbuH9XcmedKS7bYP64CD/Pfrbjvq99h6/4Nr8j6wXjbTjL7KD6xWZbP8f/1xNtOMcmwyo2tvM/55v9EfvXBVkguHxxcmuXLJ+tcMd+U8N8l9S073BAAAYJXW7cheVX0gyXySbVV1R5I3J3lLkg9X1UVJvpHkZcPmf5rkhUluTfJ/krx2vXIxe/zUCdjM/BkGwKxat7LX3a9c5qnzDrFtJ3ndemUBAAA41mz0aZwAAABsAGUPAABggpQ9AACACVL2AAAAJkjZAwAAmCBlDwAAYIKUPQAAgAlS9gAAACZI2QMAAJggZQ8AAGCClD0AAIAJUvYAAAAmSNkDAACYIGUPAABggpQ9AACACVL2AAAAJkjZAwAAmCBlDwAAYIKUPQAAgAlS9gAAACZI2QMAAJggZQ8AAGCClD0AAIAJUvYAAAAmSNkDAACYIGUPAABggpQ9AACACVL2AAAAJkjZAwAAmCBlDwAAYIKUPQAAgAlS9gAAACZolLJXVW+qqq9U1Zer6gNVtbWqTq+q66rq1qr6UFWdMEY2AACAKdjwsldVpyZ5fZKd3f3UJMcneUWStyb5ve5+cpLvJLloo7MBAABMxVincW5J8siq2pLkxCT7kzw3yeXD85cluWCkbAAAAJvelo3eYXffWVW/k+S2JH+b5JokNya5t7sPDJvdkeTUQ72+qnYl2ZUkc3NzWVhYWPfMq3XcgQey9Z69Y8eAZZlRZp0ZZdaZUWadGV17Cwv7xo6wahte9qrq5CTnJzk9yb1J/iTJ81f6+u7enWR3kuzcubPn5+fXIeXRueLqa/LAtjPHjgHL2nrPXjPKTDOjzDozyqwzo2tv/pzTxo6wamOcxvm8JF/r7m91998nuSLJs5M8bjitM0m2J7lzhGwAAACTMEbZuy3JuVV1YlVVkvOS7Eny6SQvHba5MMmVI2QDAACYhA0ve919XRZvxHJTki8NGXYn+Y0kv1pVtyb5iSTv2uhsAAAAU7Hh1+wlSXe/OcmbD1q9L8nZI8QBAACYnLE+egEAAIB1pOwBAABMkLIHAAAwQcoeAADABCl7AAAAE6TsAQAATJCyBwAAMEHKHgAAwAQpewAAABOk7AEAAEyQsgcAADBByh4AAMAEKXsAAAATpOwBAABMkLIHAAAwQcoeAADABCl7AAAAE6TsAQAATJCyBwAAMEHKHgAAwAQpewAAABOk7AEAAEyQsgcAADBByh4AAMAEKXsAAAATpOwBAABMkLIHAAAwQcoeAADABCl7AAAAE6TsAQAATJCyBwAAMEHKHgAAwASNUvaq6nFVdXlV7a2qm6vqmVX1+Kr6ZFV9dfj15DGyAQAATMFYR/YuTfLx7j4zyU8nuTnJJUmu7e4zklw7LAMAAHAENrzsVdVjkzwnybuSpLsf7O57k5yf5LJhs8uSXLDR2QAAAKZijCN7pyf5VpL3VNXnquqdVXVSkrnu3j9sc1eSuRGyAQAATMKWkfb59CQXd/d1VXVpDjpls7u7qvpQL66qXUl2Jcnc3FwWFhbWOe7qHXfggWy9Z+/YMWBZZpRZZ0aZdWaUWWdG197Cwr6xI6zaGGXvjiR3dPd1w/LlWSx736yqU7p7f1WdkuTuQ724u3cn2Z0kO3fu7Pn5+Q2IvDpXXH1NHth25tgxYFlb79lrRplpZpRZZ0aZdWZ07c2fc9rYEVZtw0/j7O67ktxeVU8ZVp2XZE+Sq5JcOKy7MMmVG50NAABgKsY4spckFyd5f1WdkGRfktdmsXh+uKouSvKNJC8bKRsAAMCmt6KyV1XXdvd5P27dSnX355PsPMRTR/R+AAAA/LDDlr2q2prkxCTbhg85r+GpxyQ5dZ2zAQAAcIR+3JG9X0ryxiRPTHJj/n/Z+26St69jLgAAAI7CYcted1+a5NKquri737ZBmQAAADhKK7pmr7vfVlXPSrJj6Wu6+33rlAsAAICjsNIbtPxRkn+S5PNJfjCs7iTKHgAAwAxa6Ucv7ExyVnf3eoYBAABgbaz0Q9W/nOQfrWcQAAAA1s5Kj+xtS7Knqj6b5O8eWtndL16XVAAAAByVlZa9/7SeIQAAAFhbK70b55+vdxAAAADWzkrvxvm9LN59M0lOSPKIJPd392PWKxgAAABHbqVH9h790OOqqiTnJzl3vUIBAABwdFZ6N86H9aL/keTfrEMeAAAA1sBKT+N8yZLF47L4uXsPrEsiAAAAjtpK78b5b5c8PpDk61k8lRMAAIAZtNJr9l673kEAAABYOyu6Zq+qtlfVR6vq7uHrI1W1fb3DAQAAcGRWeoOW9yS5KskTh6//OawDAABgBq207D2hu9/T3QeGr/cmecI65gIAAOAorLTsfbuqXl1Vxw9fr07y7fUMBgAAwJFbadn7d0leluSuJPuTvDTJL65TJgAAAI7SSj964beTXNjd30mSqnp8kt/JYgkEAABgxqz0yN5PPVT0kqS7/ybJ09YnEgAAAEdrpWXvuKo6+aGF4cjeSo8KAgAAsMFWWtj+a5K/qqo/GZZ/Lsl/Xp9IAAAAHK0Vlb3ufl9V3ZDkucOql3T3nvWLBQAAwNFY8amYQ7lT8AAAADaBlV6zBwAAwCai7AEAAEyQsgcAADBByh4AAMAEKXsAAAATpOwBAABM0Ghlr6qOr6rPVdXHhuXTq+q6qrq1qj5UVSeMlQ0AAGCzG/PI3huS3Lxk+a1Jfq+7n5zkO0kuGiUVAADABIxS9qpqe5KfSfLOYbmSPDfJ5cMmlyW5YIxsAAAAU7BlpP3+fpJfT/LoYfknktzb3QeG5TuSnHqoF1bVriS7kmRubi4LCwvrm/QIHHfggWy9Z+/YMWBZZpRZZ0aZdWaUWWdG197Cwr6xI6zahpe9qnpRkru7+8aqml/t67t7d5LdSbJz586en1/1W6y7K66+Jg9sO3PsGLCsrffsNaPMNDPKrDOjzDozuvbmzzlt7AirNsaRvWcneXFVvTDJ1iSPSXJpksdV1Zbh6N72JHeOkA0AAGASNvyave7+ze7e3t07krwiyae6++eTfDrJS4fNLkxy5UZnAwAAmIpZ+py930jyq1V1axav4XvXyHkAAAA2rbFu0JIk6e6FJAvD431Jzh4zDwAAwFTM0pE9AAAA1oiyBwAAMEHKHgAAwAQpewAAABOk7AEAAEyQsgcAADBByh4AAMAEKXsAAAATpOwBAABMkLIHAAAwQcoeAADABCl7AAAAE6TsAQAATJCyBwAAMEHKHgAAwAQpewAAABOk7AEAAEyQsgcAADBByh4AAMAEKXsAAAATpOwBAABMkLIHAAAwQcoeAADABCl7AAAAE6TsAQAATJCyBwAAMEHKHgAAwAQpewAAABOk7AEAAEyQsgcAADBByh4AAMAEKXsAAAATtOFlr6qeVFWfrqo9VfWVqnrDsP7xVfXJqvrq8OvJG50NAABgKsY4sncgya9191lJzk3yuqo6K8klSa7t7jOSXDssAwAAcAQ2vOx19/7uvml4/L0kNyc5Ncn5SS4bNrssyQUbnQ0AAGAqtoy586rakeRpSa5LMtfd+4en7koyt8xrdiXZlSRzc3NZWFhY95yrddyBB7L1nr1jx4BlmVFmnRll1plRZp0ZXXsLC/vGjrBqo5W9qnpUko8keWN3f7eqHn6uu7uq+lCv6+7dSXYnyc6dO3t+fn4D0q7OFVdfkwe2nTl2DFjW1nv2mlFmmhll1plRZp0ZXXvz55w2doRVG+VunFX1iCwWvfd39xXD6m9W1SnD86ckuXuMbAAAAFMwxt04K8m7ktzc3b+75Kmrklw4PL4wyZUbnQ0AAGAqxjiN89lJfiHJl6rq88O630ryliQfrqqLknwjyctGyAYAADAJG172uvsvk9QyT5+3kVkAAACmapRr9gAAAFhfyh4AAMAEKXsAAAATpOwBAABMkLIHAAAwQcoeAADABCl7AAAAE6TsAQAATJCyBwAAMEHKHgAAwAQpewAAABOk7AEAAEyQsgcAADBByh4AAMAEKXsAAAATpOwBAABMkLIHAAAwQcoeAADABCl7AAAAE6TsAQAATJCyBwAAMEHKHgAAwAQpewAAABOk7AEAAEyQsgcAADBByh4AAMAEKXsAAAATpOwBAABMkLIHAAAwQcoeAADABCl7AAAAE6TsAQAATJCyBwAAMEEzVfaq6vlVdUtV3VpVl4ydBwAAYLOambJXVccn+cMkL0hyVpJXVtVZ46YCAADYnGam7CU5O8mt3b2vux9M8sEk54+cCQAAYFPaMnaAJU5NcvuS5TuSnHPwRlW1K8muYfH7VXXLBmRbrW1J7hk7BByGGWXWmVFmnRll1pnRNfbzYwdY3k8u98Qslb0V6e7dSXaPneNwquqG7t45dg5Yjhll1plRZp0ZZdaZUZLZOo3zziRPWrK8fVgHAADAKs1S2bs+yRlVdXpVnZDkFUmuGjkTAADApjQzp3F294Gq+pUkn0hyfJJ3d/dXRo51pGb6NFOIGWX2mVFmnRll1plRUt09dgYAAADW2CydxgkAAMAaUfYAAAAmSNlbY1X1/Kq6papurapLxs4DS1XVk6rq01W1p6q+UlVvGDsTHEpVHV9Vn6uqj42dBQ5WVY+rqsuram9V3VxVzxw7EyxVVW8a/p7/clV9oKq2jp2JcSh7a6iqjk/yh0lekOSsJK+sqrPGTQU/5ECSX+vus5Kcm+R1ZpQZ9YYkN48dApZxaZKPd/eZSX46ZpUZUlWnJnl9kp3d/dQs3vjwFeOmYizK3to6O8mt3b2vux9M8sEk54+cCR7W3fu7+6bh8fey+A+UU8dNBT+sqrYn+Zkk7xw7Cxysqh6b5DlJ3pUk3f1gd987bir4EVuSPLKqtiQ5Mclfj5yHkSh7a+vUJLcvWb4j/iHNjKqqHUmeluS6cZPAj/j9JL+e5P+OHQQO4fQk30rynuFU43dW1Uljh4KHdPedSX4nyW1J9ie5r7uvGTcVY1H24BhUVY9K8pEkb+zu746dBx5SVS9Kcnd33zh2FljGliRPT/KO7n5akvuTuEafmVFVJ2fxzLLTkzwxyUlV9epxUzEWZW9t3ZnkSUuWtw/rYGZU1SOyWPTe391XjJ0HDvLsJC+uqq9n8VT451bVH48bCX7IHUnu6O6Hzoq4PIvlD2bF85J8rbu/1d1/n+SKJM8aORMjUfbW1vVJzqiq06vqhCxeDHvVyJngYVVVWbzO5Obu/t2x88DBuvs3u3t7d+/I4p+hn+puP5FmZnT3XUlur6qnDKvOS7JnxEhwsNuSnFtVJw5/758XNxE6Zm0ZO8CUdPeBqvqVJJ/I4p2P3t3dXxk5Fiz17CS/kORLVfX5Yd1vdfefjpgJYLO5OMn7hx/s7kvy2pHzwMO6+7qqujzJTVm8C/fnkuweNxVjqe4eOwMAAABrzGmcAAAAE6TsAQAATJCyBwAAMEHKHgAAwAQpewAAABOk7AEAAEyQsgfApFTVb1fV89bovear6mNr8D47qurLq9z+VUewn/dW1UtX+zoApknZA2Ayqur47v6P3f1nY2c5SjuSrLrsAcBSyh4Am8JwtGtvVb2/qm6uqsur6sSq+npVvbWqbkryc0uPblXVM6rqf1XVF6rqs1X16Ko6vqr+S1VdX1VfrKpf+jG7fkxVXV1Vt1TVf6uq44b3/v6SbC+tqvcOj+eq6qPDPr9QVc866L/jH1fV54Zsy2V5S5J/WVWfr6o3LbddLXr7kO3PkvzDtfheAzANW8YOAACr8JQkF3X3Z6rq3Un+/bD+29399CSpqucPv56Q5ENJXt7d11fVY5L8bZKLktzX3c+oqn+Q5DNVdU13f22ZfZ6d5Kwk30jy8SQvSXL5YTL+QZI/7+6frarjkzwqyclDpqck+WCSX+zuL1TVrkNlSXJJkv/Q3S8aXrfcdk8bvidnJZlLsifJu1f83QRg0pQ9ADaT27v7M8PjP07y+uHxhw6x7VOS7O/u65Oku7+bJFX1r5P81JJr2x6b5Iwky5W9z3b3vuG1H0jyL3L4svfcJK8Z9vmDJPdV1clJnpDkyiQv6e49w7bLZXnwoPdcbrvnJPnAsJ+/rqpPHSYXAMcYZQ+AzaSXWb5/Fe9RSS7u7k8c5T6Xrt+6gve5L8ltWSyLD5W9Q2apqvmDXrvcdi9cwX4BOEa5Zg+AzeS0qnrm8PhVSf7yMNvekuSUqnpGkgzX621J8okkv1xVjxjW/9OqOukw73N2VZ0+XKv38iX7/GZV/bNh/c8u2f7aJL88vPfxVfXYYf2Dw3avWXKnzeWyfC/Jo5e853Lb/UWSlw/7OSXJvzrMfwcAxxhH9gDYTG5J8rrher09Sd6R5OJDbdjdD1bVy5O8raoemcXr9Z6X5J1ZvNvlTVVVSb6V5ILD7PP6JG9P8uQkn07y0WH9JUk+Nrz+hixem5ckb0iyu6ouSvKDLBa//UOm+6vqRUk+OdzgZbksX0zyg6r6QpL3Jrl0me0+msXTRvdk8ajhXx3umwfAsaW6Dz47BQBmT1XtSPKx7n7qyFEAYFNwGicAAMAEObIHwDGvqv55kj86aPXfdfc5Y+QBgLWg7AEAAEyQ0zgBAAAmSNkDAACYIGUPAABggpQ9AACACfp/bYoAVlbXqoYAAAAASUVORK5CYII=\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Similarly, [Quantile Discretizer](https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.ml.feature.QuantileDiscretizer) is available in Apache Spark if the data doesn't to big and / or is distributed in a cluster of machines."
}
],
"metadata": {
"gist": {
"id": "fc0c34dacae7bf3fb46b2e6b6595681b",
"data": {
"description": "blog_percentile-buckets_long-tail-distribution.ipynb",
"public": true
}
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.6.8",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"toc": {
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": true,
"base_numbering": 1,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"_draft": {
"nbviewer_url": "https://gist.github.com/fc0c34dacae7bf3fb46b2e6b6595681b"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment