Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ibarrajo/c4b93b5a733f2c49d224c9eb798df64a to your computer and use it in GitHub Desktop.
Save ibarrajo/c4b93b5a733f2c49d224c9eb798df64a to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "## Josué Alexander Ibarra - 2018-05-21\n\nI created this jupyter notebook to show some of the specific ideas I mentioned which we didn't have the time to explore on our interview last Monday. I'm sharing this with you on the MIT license so feel free to use this as a boilerplate on any future interviews.\n\n\n## Challenge\n\n At Cedar, we aim to find the best messaging that will motivate the patient to engage with us.\n You have two lists of data:\n\n 1) A list of bill notification messages sent to patients.\n Each message has the following properties: patient_id, channel_type, timestamp\n channel_type can be \"text\", \"email\", \"paper mail\", or another channels for contacting patients.\n\n 2) A list of payments that Cedar received from patients.\n Each message has the following properties: patient_id, payment_amount, timestamp\n\n Each list is stored in memory in a list or array data structure.\n\n Your goal is to produce output that allows us to see which channel type\n is most effective at driving payments."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import time\nimport seaborn\nimport pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom numpy.random import choice\nfrom pylab import *",
"execution_count": 1,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Since we don't have actual data to use, I created the following generators with a few features.\n\n1. Channel types are extendable\n2. Channel types are also weighted like real data would be naturally.\n3. Timestamp difference between invoice notification and payment spaced randomly but logically.\n3. Building a step up from the challenge, the lists are converted into data frames"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# types with corresponding distribution weights\nchannelTypes = {\n 'text': 0.2,\n 'email': 0.5,\n 'paper_mail': 0.3\n}\n\ndef generateTimestamp(start=None, end=None):\n start = 946684800 if start is None else start\n end = time.time() if end is None else end\n return randint(start, end)\n\n# randomly generate invoices\ndef generateBillNotifications(n):\n notifications = []\n for i in range(0, n):\n b = {'patient_id': None, 'channel_type': None, 'timestamp': None}\n b['patient_id'] = i + 1000\n b['channel_type'] = choice(list(channelTypes.keys()), p=list(channelTypes.values()))\n b['timestamp'] = generateTimestamp()\n notifications.append(b)\n return pd.DataFrame(notifications)\n\n# generate payments for randomly selected invoices\ndef generatePayments(invoices):\n # select anywhere from 0 to all of the invoices\n paidInvoices = invoices.sample(randint(0, len(invoices)))\n payments = []\n for i, invoice in paidInvoices.iterrows():\n p = {'patient_id': None, 'payment_amount': None, 'timestamp': None}\n p['patient_id'] = invoice['patient_id']\n p['payment_amount'] = randint(1,10000)\n p['timestamp'] = generateTimestamp(start=invoice['timestamp'])\n payments.append(p)\n return pd.DataFrame(payments)",
"execution_count": 2,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "invoices = generateBillNotifications(10000)\ninvoices[:10]",
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 3,
"data": {
"text/plain": " channel_type patient_id timestamp\n0 paper_mail 1000 1020827124\n1 email 1001 1299812025\n2 text 1002 964712095\n3 email 1003 1323460015\n4 email 1004 1159954890\n5 email 1005 1445728857\n6 email 1006 950393850\n7 email 1007 1051824766\n8 text 1008 1376192176\n9 paper_mail 1009 1150195632",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>channel_type</th>\n <th>patient_id</th>\n <th>timestamp</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>paper_mail</td>\n <td>1000</td>\n <td>1020827124</td>\n </tr>\n <tr>\n <th>1</th>\n <td>email</td>\n <td>1001</td>\n <td>1299812025</td>\n </tr>\n <tr>\n <th>2</th>\n <td>text</td>\n <td>1002</td>\n <td>964712095</td>\n </tr>\n <tr>\n <th>3</th>\n <td>email</td>\n <td>1003</td>\n <td>1323460015</td>\n </tr>\n <tr>\n <th>4</th>\n <td>email</td>\n <td>1004</td>\n <td>1159954890</td>\n </tr>\n <tr>\n <th>5</th>\n <td>email</td>\n <td>1005</td>\n <td>1445728857</td>\n </tr>\n <tr>\n <th>6</th>\n <td>email</td>\n <td>1006</td>\n <td>950393850</td>\n </tr>\n <tr>\n <th>7</th>\n <td>email</td>\n <td>1007</td>\n <td>1051824766</td>\n </tr>\n <tr>\n <th>8</th>\n <td>text</td>\n <td>1008</td>\n <td>1376192176</td>\n </tr>\n <tr>\n <th>9</th>\n <td>paper_mail</td>\n <td>1009</td>\n <td>1150195632</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "payments = generatePayments(invoices)\npayments[:10]",
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 4,
"data": {
"text/plain": " patient_id payment_amount timestamp\n0 1538 7598 1317314002\n1 10074 3846 1411314496\n2 5023 2277 1525702420\n3 5587 6012 1278641856\n4 9117 6918 1066439321\n5 4138 5690 1412615857\n6 2120 6658 1502362324\n7 8040 689 1442316720\n8 10268 6986 1405003522\n9 4498 1303 1483601952",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>patient_id</th>\n <th>payment_amount</th>\n <th>timestamp</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1538</td>\n <td>7598</td>\n <td>1317314002</td>\n </tr>\n <tr>\n <th>1</th>\n <td>10074</td>\n <td>3846</td>\n <td>1411314496</td>\n </tr>\n <tr>\n <th>2</th>\n <td>5023</td>\n <td>2277</td>\n <td>1525702420</td>\n </tr>\n <tr>\n <th>3</th>\n <td>5587</td>\n <td>6012</td>\n <td>1278641856</td>\n </tr>\n <tr>\n <th>4</th>\n <td>9117</td>\n <td>6918</td>\n <td>1066439321</td>\n </tr>\n <tr>\n <th>5</th>\n <td>4138</td>\n <td>5690</td>\n <td>1412615857</td>\n </tr>\n <tr>\n <th>6</th>\n <td>2120</td>\n <td>6658</td>\n <td>1502362324</td>\n </tr>\n <tr>\n <th>7</th>\n <td>8040</td>\n <td>689</td>\n <td>1442316720</td>\n </tr>\n <tr>\n <th>8</th>\n <td>10268</td>\n <td>6986</td>\n <td>1405003522</td>\n </tr>\n <tr>\n <th>9</th>\n <td>4498</td>\n <td>1303</td>\n <td>1483601952</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "After generating the two dataframes `invoices` containing all of the notifications sent and `payments` containing a random amount of payments that correspond to some of the invoices sent we can start our data analysis.\n\nThe first step is to do an outer join of our two data frames on the unique `patient_id`.\n\nWe also take the opportunity to create two new colums.\n\n1. `is_paid` a boolean determining if a patient paid based on the contents of the `timestamp_payment`\n2. `payment_wait` is an interger denoting the number of seconds since the bill came out until a payment was made."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "invoices = pd.merge(invoices, payments, on='patient_id', how='outer', suffixes=('_invoice','_payment'))\ninvoices['is_paid'] = invoices['timestamp_payment'].notnull()\ninvoices['payment_wait'] = invoices['timestamp_payment'] - invoices['timestamp_invoice']\ninvoices[:10]",
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 5,
"data": {
"text/plain": " channel_type patient_id timestamp_invoice payment_amount \\\n0 paper_mail 1000 1020827124 5166.0 \n1 email 1001 1299812025 8878.0 \n2 text 1002 964712095 3740.0 \n3 email 1003 1323460015 4221.0 \n4 email 1004 1159954890 9751.0 \n5 email 1005 1445728857 7059.0 \n6 email 1006 950393850 53.0 \n7 email 1007 1051824766 6464.0 \n8 text 1008 1376192176 6398.0 \n9 paper_mail 1009 1150195632 2837.0 \n\n timestamp_payment is_paid payment_wait \n0 1.126015e+09 True 105187406.0 \n1 1.315689e+09 True 15876740.0 \n2 9.717578e+08 True 7045666.0 \n3 1.413995e+09 True 90535039.0 \n4 1.375020e+09 True 215064798.0 \n5 1.478400e+09 True 32671618.0 \n6 9.767055e+08 True 26311615.0 \n7 1.065784e+09 True 13959210.0 \n8 1.380074e+09 True 3881391.0 \n9 1.166223e+09 True 16027352.0 ",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>channel_type</th>\n <th>patient_id</th>\n <th>timestamp_invoice</th>\n <th>payment_amount</th>\n <th>timestamp_payment</th>\n <th>is_paid</th>\n <th>payment_wait</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>paper_mail</td>\n <td>1000</td>\n <td>1020827124</td>\n <td>5166.0</td>\n <td>1.126015e+09</td>\n <td>True</td>\n <td>105187406.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>email</td>\n <td>1001</td>\n <td>1299812025</td>\n <td>8878.0</td>\n <td>1.315689e+09</td>\n <td>True</td>\n <td>15876740.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>text</td>\n <td>1002</td>\n <td>964712095</td>\n <td>3740.0</td>\n <td>9.717578e+08</td>\n <td>True</td>\n <td>7045666.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>email</td>\n <td>1003</td>\n <td>1323460015</td>\n <td>4221.0</td>\n <td>1.413995e+09</td>\n <td>True</td>\n <td>90535039.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>email</td>\n <td>1004</td>\n <td>1159954890</td>\n <td>9751.0</td>\n <td>1.375020e+09</td>\n <td>True</td>\n <td>215064798.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>email</td>\n <td>1005</td>\n <td>1445728857</td>\n <td>7059.0</td>\n <td>1.478400e+09</td>\n <td>True</td>\n <td>32671618.0</td>\n </tr>\n <tr>\n <th>6</th>\n <td>email</td>\n <td>1006</td>\n <td>950393850</td>\n <td>53.0</td>\n <td>9.767055e+08</td>\n <td>True</td>\n <td>26311615.0</td>\n </tr>\n <tr>\n <th>7</th>\n <td>email</td>\n <td>1007</td>\n <td>1051824766</td>\n <td>6464.0</td>\n <td>1.065784e+09</td>\n <td>True</td>\n <td>13959210.0</td>\n </tr>\n <tr>\n <th>8</th>\n <td>text</td>\n <td>1008</td>\n <td>1376192176</td>\n <td>6398.0</td>\n <td>1.380074e+09</td>\n <td>True</td>\n <td>3881391.0</td>\n </tr>\n <tr>\n <th>9</th>\n <td>paper_mail</td>\n <td>1009</td>\n <td>1150195632</td>\n <td>2837.0</td>\n <td>1.166223e+09</td>\n <td>True</td>\n <td>16027352.0</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Now that we have the joined dataframe `invoices` we can first start off by creating histograms on the column type and comparing the distribution of the notifications by `channel_type`.\n\nWe try to find a correlation to a channel being more performant by grouping them by `is_paid`. Most of the time the generated data will not show an inclination for a method to be more effective at this stage since the paid records are randomly selected."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "invoices.hist(column='channel_type', by='is_paid');",
"execution_count": 6,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<matplotlib.figure.Figure at 0x1a08f39dd8>",
"image/png": "\n"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "To yield a better insight of what we are doing we can better aggregate the number of payments by `channel_type` using the `crosstab` pandas method."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.crosstab(invoices['channel_type'], invoices['is_paid'], margins=True)",
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 7,
"data": {
"text/plain": "is_paid False True All\nchannel_type \nemail 603 4483 5086\npaper_mail 329 2615 2944\ntext 200 1770 1970\nAll 1132 8868 10000",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th>is_paid</th>\n <th>False</th>\n <th>True</th>\n <th>All</th>\n </tr>\n <tr>\n <th>channel_type</th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>email</th>\n <td>603</td>\n <td>4483</td>\n <td>5086</td>\n </tr>\n <tr>\n <th>paper_mail</th>\n <td>329</td>\n <td>2615</td>\n <td>2944</td>\n </tr>\n <tr>\n <th>text</th>\n <td>200</td>\n <td>1770</td>\n <td>1970</td>\n </tr>\n <tr>\n <th>All</th>\n <td>1132</td>\n <td>8868</td>\n <td>10000</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Integer amounts are not quite cutting it, maybe if we read it in more absolute terms using percentages."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "pd.crosstab(invoices['channel_type'], invoices['is_paid'], margins=True, normalize ='all')",
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 8,
"data": {
"text/plain": "is_paid False True All\nchannel_type \nemail 0.0603 0.4483 0.5086\npaper_mail 0.0329 0.2615 0.2944\ntext 0.0200 0.1770 0.1970\nAll 0.1132 0.8868 1.0000",
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th>is_paid</th>\n <th>False</th>\n <th>True</th>\n <th>All</th>\n </tr>\n <tr>\n <th>channel_type</th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>email</th>\n <td>0.0603</td>\n <td>0.4483</td>\n <td>0.5086</td>\n </tr>\n <tr>\n <th>paper_mail</th>\n <td>0.0329</td>\n <td>0.2615</td>\n <td>0.2944</td>\n </tr>\n <tr>\n <th>text</th>\n <td>0.0200</td>\n <td>0.1770</td>\n <td>0.1970</td>\n </tr>\n <tr>\n <th>All</th>\n <td>0.1132</td>\n <td>0.8868</td>\n <td>1.0000</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "We understand that the way the data was generated, a stronger correlation of a `channel_type` to its performance might not be strongly shown.\n\nA neat way to visualize and in my opinion the best way to find hidden correlations is to create a heatmap.\n\nTo do this, the first step is to create a one-hot encoding of our categorical data (the channel types). It is accomplished by the pandas `get_dummies` method and joined into a column-wise subset of our `invoices` table.\n\nWe run the correlation and generate the output using the data visualization library `seaborn`.\n\nResults will vary on each full run of this notebook, but the interesting correlations will be clearly visible here."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# one-hot encoding for channel_type\ninvoicesCorr = invoices[['is_paid']].join(pd.get_dummies(invoices['channel_type']))\n\ninvoicesCorr = invoicesCorr.corr()\nseaborn.heatmap(invoicesCorr)\n\nplt.title('Channel type correlation heatmap')\nplt.show()",
"execution_count": 9,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<matplotlib.figure.Figure at 0x1a090aeb38>",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZMAAAEJCAYAAABR4cpEAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xu8FVX9//HXW8QbeCPTvGOFpaHiDdLUSMOffiu1n/jVQMMukt+89Ku0H34zI9NfXrpYqSWW4SXTNB9K/iwvFN41UFFEMxDpB0KYNxI1hXM+vz9mHRk2m3PmMPuc2efwfvKYx5mZtWbWmtmb/dlrrdkzigjMzMzKWKvqCpiZWc/nYGJmZqU5mJiZWWkOJmZmVpqDiZmZleZgYmZmpTmY9ECSxku6pup61JI0RdIXq65HM5E0UdI5JbZfIum9jaxT2u9cSR9v9H5tzeVg0qQkjZI0LX2YLJT0B0n7VV2v1dWsAbCZ1AvGEdE/IuZUVafOknS8pPuqrod1PweTJiTpa8BFwP8BtgC2Ay4FDq+yXmsySWsXWWe2pnIwaTKSNgbOBk6KiJsi4vWIWBoRv4+I03NZ15F0laTXJM2UtFduH+MkPZvSnpL06Vza8ZLuk/R9Sa9Iek7Sobn0KZK+K+n+tP0dkjbLpX9Y0gOSXpX0uKThBY7pEOC/gaNTS+txSUdJeqQm39cl3ZzmJ0r6uaQ7Uz3ulrR9Lu8HU9rLkp6R9J/tlD9A0q8kLUjHfHMu7QRJs9N+JknaKpcWkk6SNAuY1c66QnWRtKmkWyX9M9XjVknbpLRzgf2Bi9M5ujhX3vvT/MbpNf+npL9LOlPSWimt3dd1FYZIekLSYknXS1ovV9dPSpqeXucHJO2aS6v7/pK0E/BzYJ90DK/mXstLU+t6SXpvvUfSRamuf5W0e0f7zx3n/ZJ+mur9V0kHdXCc1h0iwlMTTcAhwDJg7XbyjAf+DfwH0Af4HvBQLv0oYCuyLwtHA68DW6a044GlwAlp2/8CFgBK6VOAZ4EdgfXT8nkpbWvgpVTuWsCItPzu3LZfbKfO1+SW1wVeBnbKrXsMODLNTwReAw5IeX8M3JfS+gHzgM8BawN7AC8CH1pF2f8XuB7YFOgLfDStPzBtt0cq46fAPbntArgTGACsX29dR3VJx3FOmn8XcCSwAbAhcANwc668lc5fKu/9af4q4Ja07UDgb8AXiryudc7JXOAvZO+TAcDTwIkpbQ/gBWBY2teYlH/dgu+v+2rKmpjOyZ7AesCfgOeAz6b9nwP8uRPv32XAV9NreTSwGBhQ9f/dNX2qvAKeal4QGA38o4M844G7css7A2+2k386cHiaPx6YnUvbIH1gvSctTwHOzKV/Gfhjmv/fwNU1+74dGJPbtlAwSet+Bpyb5j8EvJL7wJoIXJfL2x9oAbZNHyD31uzrMuDbdcrdEmgFNq2T9kvggpoylgID03IAB9Zss8K6jupCLpjUKX8I8EpueaXzl8p7f/rQfQvYOZf2JWBKkde1TtlzgWNzyxcAP8+9Lt+tyf8MKQgXeH/VCyaX55ZPAZ7OLe8CvNqJ9+8KQZIsKB5X5v+dp/KTu7maz0vAZuq4P/4fufk3gPXatpH02VwXxavAYGCzettGxBtptn87+25L2x44qm2/ad/7kX1gr44rgVGSBBwH/DYi3sqlz8vVcwlZS2arVI9hNfUYDbynThnbAi9HxCt10rYC/l5TxktkLbCV6rCKdYXrImkDSZelLqp/AfcAm0jqU6eMWpsB6+Trm+bzde3oda3V3uv89Zpj2pbsfBV5f9WzKDf/Zp3ld+pZYP/PR4oiyd/b6mbV8QBi83mQrAvrCODGzm6cxhUuBw4CHoyIFknTATWgbvPIWiYnrMa2K92eOiIekvQ22VjBqDTlbds2I6k/WXfMglSPuyNiRME6D5C0SUS8WpO2gOyDs62MfmRdUc+3V++adZ2py9eBDwDDIuIfkoaQde21vTbt3cL7RbJW0/bAU2nddjV1bZR5ZC3Gc2sTCry/St2GvOD7d2tJygWU7YBJZcq18twyaTIRsRg4C7hE0hHp22xfSYdKuqDALvqR/Yf+J4Ckz5F9s2uEa4BPSfofkvpIWk/S8LZB5A4sAga2DRjnXAVcDCyLiNpLSv9D0n6S1gG+CzwcEfOAW4EdJR2Xzk1fSXunAeAVRMRC4A/ApWkAvK+kA1LytcDnJA2RtC7Z1XMPR8TcYqcDOlMXsrGON4FXJQ0Avl2Tvgio+5uSiGgBfgucK2nD9KH7NbLXpNEuB06UNEyZfpI+IWlDOn5/LQK2Sa/Z6ijy/t0cODWd66OAnYDbVrM8axAHkyYUET8k+6A4k+w/1TzgZODm9rZL2z4F/ICshbOIrD/6/gbVax7Z5cn/navX6RR7H92Q/r4k6dHc+qvJPiyurrPNtWQfuC+TDd6OTvV4DTgYOIasdfEP4HyyQfR6jiP7Vv9XsoHl/5X2Mxn4FvA7YCHwvrTPwjpZl4vIBu1fBB4C/liT/mNgZLrC6Sd1tj+FbDB6DnAf2fm5ojP1LSIippEN5F9MNo41m2ysosj760/ATOAfkl5cjbKLvH8fBgaRncdzgZER8VJny7LG0opdj2bdS9L6ZB/we0TErNz6icD8iDizqrpZ85F0PNlFCj32B7y9lVsmVrX/AqbmA4mZ9TwegLfKSJpLNrB6RMVVMbOS3M1lZmaluZvLzMxKczfXKix9cY6bbMDEIWdVXYWm0dfviHf4XCw3esE1pX/D1ZnPm76bvbcRvxlrOLdMzMysNLdMzMyq1tpSdQ1KczAxM6tay7Kqa1Cag4mZWcUiWquuQmkOJmZmVWt1MDEzs7LcMjEzs9I8AG9mZqW5ZWJmZmVFL7iayz9aNDOrWmtr8akASYdIekbSbEnj6qRvL2mypCckTSn4gLt2OZiYmVUtWotPHZDUB7gEOBTYGfiMpJ1rsn0fuCoidgXOBr5X9hAcTMzMqtbaUnzq2FBgdkTMiYi3gevInpCatzMwOc3/uU56pzmYmJlVrYEtE2Brskdqt5mf1uU9DhyZ5j8NbCjpXWUOwcHEzKxqnRgzkTRW0rTcNLZmb/XuKlx7V+LTgI9Kegz4KPA8UOoqAF/NZWZWtU5czRURE4AJ7WSZD2ybW94GWFCzjwXA/wSQ1B84MiIWF65EHQ4mZmYVi2jojxanAoMk7UDW4jgGGJXPIGkz4OXIbgp2BnBF2ULdzWVmVrUGjplExDLgZOB24GngtxExU9LZkg5L2YYDz0j6G7AFcG7ZQ3DLxMysag2+0WNE3AbcVrPurNz8jcCNjSzTwcTMrGq+nYqZmZXWsrTqGpTmYGJmVjU/z8TMzEpzN5eZmZXmlomZmZXWC4JJJb8zkfRAN5RxoqTP1lk/UNKTXV2+mVlRES2Fp2ZVScskIvbthjJ+3tVlmJk1hB+OtXokLUl/t5R0j6Tpkp6UtH9720j6gaRH00Nd3p3WnyBpqqTHJf1O0gZp/XhJp6X5PVP6g8BJ3XCIZmbFNfjhWFWo+nYqo4DbI2IIsBswvZ28/YBHI2IP4G7g22n9TRGxd0TsRnbrgC/U2fZXwKkRsU97lcnfjfMXV/2ms8diZrZ6GnsL+kpUPQA/FbhCUl/g5ohoL5i0Aten+WuAm9L8YEnnAJsA/cnuR/MOSRsDm0TE3WnV1WRPIFtJ/m6cS1+cU3vLZjOzrtHELY6iKm2ZRMQ9wAFkd7a8ut6AeXubp78TgZMjYhfgO8B6NfmUy2tm1nx6Qcuk0mAiaXvghYi4HPglsEc72dcCRqb5UcB9aX5DYGFq3Yyu3SgiXgUWS9ovrVopj5lZpXrBmEnV3VzDgdMlLQWWAO21TF4HPiTpEWAxcHRa/y3gYeDvwAyy4FLrc2TdaW9Q0w1mZla5XnA1lyJ6Rg+QpCUR0b+7yvOYSWbikLM6zrSG6Ot3xDt8LpYbveCaeo/J7ZQ3J32/8Bld/7DTSpfXFapumZiZWROPhRTVdMFE0sPAujWrj+vOVomZWbdq4rGQopoumETEsKrrYGbWrdwyMTOz0pb1/AF4BxMzs6r1kAuh2uNgYmZWNY+ZmJlZaQ4mZmZWmgfgzcysNLdMzMystJbmfYJiUQ4mZmZVc8vEzMxK85iJmZmVFa3+nYmZmZXlbi4zMyvN3VxmZlbasp5/NVelj+01MzMa/theSYdIekbSbEnjVpHnPyU9JWmmpGvLHoJbJmZmVWvgjR4l9QEuAUYA84GpkiZFxFO5PIOAM4CPRMQrkjYvW66DiZlZ1Ro7AD8UmB0RcwAkXQccDjyVy3MCcElEvAIQES+ULdTdXGZmVWuNwpOksZKm5aaxNXvbGpiXW56f1uXtCOwo6X5JD0k6pOwhuGWyChOHnFV1FZrC8dPPrroKTWP9rfavugpNY/uNtqi6Ck1jdCN20onbqUTEBGBCO1lUb7Oa5bWBQcBwYBvgXkmDI+LVwhWps0MzM6tQNLabaz6wbW55G2BBnTwPRcRS4DlJz5AFl6mrW6i7uczMqtaJbq4CpgKDJO0gaR3gGGBSTZ6bgY8BSNqMrNtrTplDcMvEzKxqDfzRYkQsk3QycDvQB7giImZKOhuYFhGTUtrBkp4CWoDTI+KlMuU6mJiZVa3B9+aKiNuA22rWnZWbD+BraWoIBxMzs6r53lxmZlaaH45lZmal+Rb0ZmZWVoMvDa6Eg4mZWdXcMjEzs9IcTMzMrDQ/HMvMzMqKZQ4mZmZWlru5zMysNF/NZWZmpbllYmZmpTmYmJlZWdHibi4zMyvLLRMzMysrHEzMzKw0BxMzMyut5w+ZOJiYmVXN3VxmZlbeMgcTMzMrqTe0TNaqugJdTdJWkm5M88Ml3Vp1nczMVtDaialJ9fqWSUQsAEZWXQ8zs1Vxy6QLSDpW0l8kTZd0maQ+kpZIOl/SI5LukjRU0hRJcyQdlrYbKOleSY+mad/c+ierPSozs3b0gpZJUwUTSTsBRwMfiYghQAswGugHTImIPYHXgHOAEcCngbPT5i8AIyJij7SPn6xG+WMlTZM07Z7XZ5U+HjOzImJZ8alZNVs310HAnsBUSQDrkwWJt4E/pjwzgLciYqmkGcDAtL4vcLGktiC0Y2cLj4gJwASAy7c5tue3O82sR+gFD1psumAi4MqIOGOFldJpEdH24d4KvAUQEa2S2o7hq8AiYDeyFte/u6fKZmYl9YJg0lTdXMBkYKSkzQEkDZC0fcFtNwYWRkQrcBzQp4vqaGbWUNFafGpWTdUyiYinJJ0J3CFpLWApcFLBzS8FfifpKODPwOtdVE0zs4Zq5iBRVFMFE4CIuB64vmZ1/1z6+Jr8/dPfWcCuuaQz0vq5wOA0PwWY0tgam5mV42BiZmalRYuqrkJpzTZmYma2xolWFZ6KkHSIpGckzZY0rk76iZJmpN/z3Sdp57LH4GBiZlaxRg7AS+oDXAIcCuwMfKZOsLg2InZJv+e7APhh2WNwMDEzq1iECk8FDAVmR8SciHgbuA44fMXy4l+5xX5A6d/VeczEzKxinRmAlzQWGJtbNSH94LrN1sC83PJ8YFid/ZwEfA1YBziwE9Wty8HEzKxiRcdCYMU7daxCvZ2t1PKIiEuASySNAs4ExhSuRB0OJmZmFWtt7NVc84Ftc8vbAAvayX8d8LOyhXrMxMysYg2+mmsqMEjSDpLWAY4BJuUzSBqUW/wEUPrOtm6ZmJlVLBp4W9mIWCbpZOB2sttKXRERMyWdDUyLiEnAyZI+TnaXkVco2cUFDiZmZpXrzJhJof1F3AbcVrPurNz8VxpaIA4mZmaVK3jJb1NzMDEzq1hLL7idioOJmVnF3DIxM7PSGj1mUgUHEzOzijXyaq6qOJiYmVXMLRMzMyut1WMmZmZWVqtbJmZmVpZbJmZmVpovDTYzs9J8NVcv1rcXvLiNsP5W+1ddhabx5oJ7q65C0xi5x6lVV6FXcTeXmZmV5m4uMzMrrcXBxMzMynI3l5mZleZuLjMzK6216go0gIOJmVnFArdMzMyspGXu5jIzs7LcMjEzs9I8ZmJmZqW5ZWJmZqW5ZWJmZqU5mJiZWWktcjeXmZmV1OoxEzMzK6s3PPHCwcTMrGIeMzEzs9JaPWZiZmZl9YZurrWqroCZ2ZpumYpPRUg6RNIzkmZLGlcnfV1J16f0hyUNLHsMDiZmZhVrRYWnjkjqA1wCHArsDHxG0s412b4AvBIR7wd+BJxf9hgcTMzMKhadmAoYCsyOiDkR8TZwHXB4TZ7DgSvT/I3AQVK5gRsHEzOzirWq+FTA1sC83PL8tK5unohYBiwG3lXmGBxMzMwq1tqJSdJYSdNy09ia3dULObWNmiJ5OsVXc5mZVaylEx1METEBmNBOlvnAtrnlbYAFq8gzX9LawMbAy8VrsbI1omUi6bC2KxokjZd0WtV1MjNr05mWSQFTgUGSdpC0DnAMMKkmzyRgTJofCfwpInpPy0TS2qn/rqEiYhIrn0wzs6bQyF/AR8QySScDtwN9gCsiYqaks4Fp6fPwl8DVkmaTtUiOKVtuh8EkXX/8R+BhYHfgb8BngdOATwHrAw8AX4qIkDQFmE52RcFGwOcj4i+S+gE/BXZJ5Y6PiFskHQ98AlgP6AccWKcOw4HvAIuAIcBNwAzgK6n8IyLiWUmfAs4E1gFeAkZHxKJUxl4RcXIHxzoWGAswZuOhDO83qKPTY2ZWWqMfAR8RtwG31aw7Kzf/b+CoRpZZtJvrA8CEiNgV+BfwZeDiiNg7IgaTfaB/Mpe/X0Tsm/JdkdZ9k6wptTfwMeDCFGAA9gHGRMRKgSRnN7LgsQtwHLBjRAwFfgGckvLcB3w4InYnuxzuGwWPD8j6IiNir4jYy4HEzLpLg7u5KlG0m2teRNyf5q8BTgWek/QNYANgADAT+H3K8xuAiLhH0kaSNgEOBg7LjVesB2yX5u+MiI4Gf6ZGxEIASc8Cd6T1M8iCE2QDTddL2pKsdfJcweMzM6tMMweJoooGk9qBmQAuJes6midpPFlwaC+/gCMj4pl8gqRhwOsF6vBWbr41t9zK8uP4KfDDiJiUusbGF9ivmVmlOnM1V7Mq2s21naR90vxnyLqTAF6U1J/saoC8owEk7QcsjojFZINBp7T9ylLS7qVqXt/GwPNpfkx7Gc3MmsWa1M31NDBG0mXALOBnwKZkXUxzyS5Fy3tF0gOkAfi07rvARcATKaDMZcVxlkYYD9wg6XngIWCHBu/fzKzhmjlIFFU0mLRGxIk1685MUz2/i4gz8isi4k3gS7UZI2IiMLG9wiNiCjAltzy8XlpE3ALc0l4ZETG+vbLMzLpbb7gFfVP9zsTMbE1U8J5bTa3DYBIRc4HBRXeYbzV0lqRdgKtrVr8VEcNWd59mZs2upeoKNEBTtUwiYgbZjxLNzNYYrb2go6upgomZ2ZpoTRqANzOzLtLz2yUOJmZmlXPLxMzMSlsjruYyM7Ou1dILOrocTMzMKuZuLjMzK82XBpuZWWk9P5Q4mJiZVc7dXGZmVpq7uczMrDTfm8vMzEoLt0zMzKwsj5mYmVlpHjMxM7PSen4ocTAxM6vcsl4QThxMzMwq5gH4Xqxvz39tG2L7jbaougpNY+Qep1ZdhaZx46M/qboKvYoH4M3MrDS3TMzMrDS3TMzMrLTWcMvEzMxK6g0Px1qr6gqYma3pohP/ypA0QNKdkmalv5vWybO9pEckTZc0U9KJRfbtYGJmVrHWTkwljQMmR8QgYHJarrUQ2DcihgDDgHGStupoxw4mZmYVayUKTyUdDlyZ5q8EjqjNEBFvR8RbaXFdCsYJBxMzs4p1pptL0lhJ03LT2E4UtUVELARIfzevl0nStpKeAOYB50fEgo527AF4M7OKdab7KiImABNWlS7pLuA9dZK+2Yky5gG7pu6tmyXdGBGL2tvGwcTMrGIt0bhfmkTEx1eVJmmRpC0jYqGkLYEXOtjXAkkzgf2BG9vL624uM7OKdeMA/CRgTJofA9xSm0HSNpLWT/ObAh8Bnuloxw4mZmYV665Lg4HzgBGSZgEj0jKS9pL0i5RnJ+BhSY8DdwPfj4gZHe3Y3VxmZhXrrodjRcRLwEF11k8Dvpjm7wR27ey+HUzMzCoWvp2KmZmV1Rtup+JgYmZWMT8D3szMSnM3l5mZleaWiZmZleYnLZqZWWl+OJaZmZXmq7nMzKw0j5mYmVlpveFqrh51by5Jm0j68mpuO1DSqEbXycysrG58OFaX6VHBBNgEWK1gAgwEHEzMrOl0440eu0xPCybnAe9LD7q/UNLpkqZKekLSdwAk7Z2W15PUT9JMSYPTtvunbb9a6VGYmeVEROGpWfW0MZNxwOCIGCLpYGAkMBQQMEnSARFxj6RJwDnA+sA1EfGkpHHAaRHxycpqb2ZWRyMfjlWVntYyyTs4TY8BjwIfBAaltLPJ7tW/F3BB0R3mn638pzdmNbi6Zmb19YYxk57WMskT8L2IuKxO2gCgP9AXWA94vcgO889W/vVWxzbvq2ZmvUozj4UU1dNaJq8BG6b524HPS+oPIGlrSZuntAnAt4BfA+fX2dbMrGm0RhSemlWPaplExEuS7pf0JPAH4FrgQUkAS4BjJR0CLIuIayX1AR6QdCBwL7AsPYpyYkT8qKLDMDNbQW9omfSoYAIQEbWX9/64ZvlZ4KqUtwUYlktb6XGVZmZV6w0D8D0umJiZ9TbN3H1VlIOJmVnF3M1lZmaluWViZmaluWViZmalhQfgzcysLF/NZWZmpTXzbVKKcjAxM6tYM98NuCgHEzOzivlqLjMzK81Xc5mZWWm9oZurp9012Mys12mJ1sJTGZIGSLpT0qz0d9NV5NtO0h2Snpb0lKSBHe3bwcTMrGLdeAv6ccDkiBgETE7L9VwFXBgRO5E9zfaFjnbsYGJmVrFufAb84cCVaf5K4IjaDJJ2BtaOiDtT3ZZExBsd7djBxMysYp15bG/+8eJpGtuJoraIiIUA6e/mdfLsCLwq6SZJj0m6MD0bql0egDczq1hnWhz5x4vXI+ku4D11kr5ZsIi1gf2B3YH/B1wPHA/8sqONzMysQo28nUpEfHxVaZIWSdoyIhZK2pL6YyHzgcciYk7a5mbgw3QQTNzNZWZWsW4cgJ8EjEnzY4Bb6uSZCmwq6d1p+UDgqY527GBiZlaxbhyAPw8YIWkWMCItI2kvSb9IdWkBTgMmS5oBCLi8ox27m8vMrGLd9Qv4iHgJOKjO+mnAF3PLdwK7dmbfDiZmZhXrDb+AdzAxM6tYbwgm6g0H0VtJGpsuA1zj+Vws53OxnM9F8/AAfHPrzI+Rejufi+V8LpbzuWgSDiZmZlaag4mZmZXmYNLc3Be8nM/Fcj4Xy/lcNAkPwJuZWWlumZiZWWkOJmZmVpqDifUKkraSdGOaHy7p1qrrZKtP0iaSvrya2w6UNKrRdbL2OZh0IUkPdEMZJ0r6bJ31AyU92dXlN4uIWBARI6uuR08m6TBJ49L8eEmnVVidTYDVCibAQMDBpJs5mHShiNi3G8r4eURc1dXlNJqkYyX9RdJ0SZdJ6iNpiaTzJT0i6S5JQyVNkTRH0mFpu4GS7pX0aJr2za1fI4KnpC65DVJETIqI87pi36vhPOB96f1xoaTTJU2V9ISk7wBI2jstryepn6SZkganbfdP23610qNYgziYdCFJS9LfLSXdk97cT0rav71tJP0gfVBObnumgKQT0n+mxyX9TtIGaf073yAl7ZnSHwRO6oZDXC2SdgKOBj4SEUOAFmA00A+YEhF7Aq8B55DdJvvTwNlp8xeAERGxR9rHT7q5+nWlYPZXSVemD7gbJW0g6az0uj0paYIkpfxTJF0k6YGUNjSt7yfpirTNY5IOT+uPl3SDpN8Dd6yiDsMl3S3pt5L+Juk8SaNT0J4h6X0p36ckPZz2f5ekLXJlXNwtJ6xj44Bn0/vjTmAQMBQYAuwp6YCImEr2fI5zgAuAayLiybTtvRExJCJ+VE311zwOJt1jFHB7+o+xGzC9nbz9gEfTh+XdwLfT+psiYu+I2A14GvhCnW1/BZwaEfs0rupd4iBgT2CqpOlp+b3A28AfU54ZwN0RsTTND0zr+wKXp+cs3ADs3I317sgHgAkRsSvwL7JumovT6zYYWB/4ZC5/v9R6/TJwRVr3TeBPEbE38DHgQkn9Uto+wJiIOLCdOuwGfAXYBTgO2DEihgK/AE5Jee4DPhwRuwPXAd8oc9Dd4OA0PQY8CnyQLLhA9iVjBLAXWUCxiviuwd1jKnCFpL7AzRHRXjBpJXvmMsA1wE1pfrCkc8j6kvsDt+c3krQxsElE3J1WXQ0c2qD6N5qAKyPijBVWSqfF8h8+tQJvAUREa65r56vAIrIPzbWAf3dPlQuZFxH3p/lrgFOB5yR9A9gAGADMBH6f8vwGICLukbSRpE3IPjQPy41XrAdsl+bvjIiXO6jD1IhYCCDpWZa3YmaQBSeAbYDrlT22dR3gudU62u4j4HsRcVmdtAFk/x/6kp2r17uzYracWybdICLuAQ4Angeurjdg3t7m6e9E4OSI2AX4Dtl/nDzl8ja7ycBISZsDSBogafuC224MLIyIVrJv3n26qI6ro/b8B3ApMDK9bpez4utWL7+AI1MXzZCI2C4ink7pRT4o38rNt+aWW1n+5fGnZC2mXYAvsfJ7qRm8BmyY5m8HPi+pP4CkrdveO2S/gP8W8Gvg/DrbWjdxMOkG6YPyhYi4HPglsEc72dcC2q5KGkXWJQHZf46FqXUzunajiHgVWCxpv7RqpTzNIiKeAs4E7pD0BFmf+JYFN78UGCPpIWBHmuub6HaS2roYP8Py1+7F9EFYe7XZ0QDpNVscEYvJPjhPyY2t7N4F9dyY7IsNLH8eeFNJTwS8P11UMQK4FngwdW/eCGyYvpQti4hryQbd95Z0IPAEsCyNH3oAvpu4m6t7DAdOl7QUWAK01zJ5HfiQpEeAxaQPHLJvXw8Dfyfrsqj3zetzZN1pb1DTDdZsIuJ6lnfntemfSx9fk79/+juLFR8nekZaPxcYnOanAFMaW+NCniYLdJcBs4CfAZuSvV5zybo7815Rdvn4RsDn07rvAhcBT6SAMpcVx1kaYTxwg6TngYeAHRq8/4aIiNrLe3/G/VQKAAAAe0lEQVRcs/wscFXK2wIMy6Wt9Gha61q+N1eTkbSk7YPTeg5JA4Fb00B7kfxTgNPSs7fNejx3c5mZWWlumVRE0sPAujWrj4uIGVXUx3oOSbuQXa2X91ZEDKuX36w7OJiYmVlp7uYyM7PSHEzMzKw0BxMzMyvNwcTMzEr7/xqaWUHeDs/fAAAAAElFTkSuQmCC\n"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Now, the answer to the question of which channel type performs better.\n\nSince we are using mock data here, the answer is hard to determine, but the methods we used to explore the data still stand strongly.\n\nUsing real data with the same constraints as this challenge could still open the door to obtain insights into the following questions:\n\n1. Is the payment_amount affecting the time it takes to get paid?\n2. Are some notification channels quicker in the time it takes us to get paid?\n3. And finally how much are we getting paid by each notification channel type\n\nBesides having the holistic insight of \"this channel is better than all of the others\". It would not be surprising to find that certain notification types such as text work better for recently invoiced patients. Whereas other more formal methods such as certified mail can be more successful bringing in payments for invoices that have been long overdue.\n\n---\n\n Personally, I had fun working a bit more on this.\n It's something that could not be done in a standard technical interview, and I took\n the liberty of making this an exercise of my data science skills.\n I really hope this is useful or that you get the chance to run these computations on real data.\n\n I look forward to hearing back from you!\n\n - Josué Alexander Ibarra\n\n josue@elninja.com\n \n---"
},
{
"metadata": {},
"cell_type": "markdown",
"source": " MIT License\n\n Copyright (c) 2018 Josué Alexander Ibarra\n\n Permission is hereby granted, free of charge, to any person obtaining a copy\n of this software and associated documentation files (the \"Software\"), to deal\n in the Software without restriction, including without limitation the rights\n to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the Software is\n furnished to do so, subject to the following conditions:\n\n The above copyright notice and this permission notice shall be included in all\n copies or substantial portions of the Software.\n\n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n SOFTWARE."
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.6.4",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment