Skip to content

Instantly share code, notes, and snippets.

@daxiongshu
Created February 22, 2019 00:53
Show Gist options
  • Save daxiongshu/7387eea1a73f24e01edf8160e956c2f0 to your computer and use it in GitHub Desktop.
Save daxiongshu/7387eea1a73f24e01edf8160e956c2f0 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"cudf version 0.5.0+18.gcdebbed\n"
]
}
],
"source": [
"import cudf as gd\n",
"import pandas as pd\n",
"import numpy as np\n",
"import math\n",
"import xgboost as xgb\n",
"import seaborn as sns\n",
"from functools import partial\n",
"from sklearn.preprocessing import LabelEncoder\n",
"from sklearn.model_selection import train_test_split\n",
"from termcolor import colored\n",
"from cudf_workaround import cudf_groupby_aggs\n",
"import matplotlib.pyplot as plt\n",
"import os\n",
"import time\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")\n",
"sns.set()\n",
"print('cudf version',gd.__version__)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**This notebook demos the 8th place solution (8/1094) of Rapids.ai for the __[PLAsTiCC Astronomical Classification](https://www.kaggle.com/c/PLAsTiCC-2018/leaderboard)__. The demo shows up to 140x speedup for ETL and 25x end-to-end speedup over the CPU solution. More details can be found at our __[blog](https://medium.com/rapids-ai/make-sense-of-the-universe-with-rapids-ai-d105b0e5ec95)__** "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of contents\n",
"[1. Global variables](#global)<br>\n",
"[2. Functions](#func)<br>\n",
"[3. ETL & Visualizations](#etl)<br>\n",
"[4. Model training](#train)<br>\n",
"[5. Conclusions](#conclusions)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"global\"></a>\n",
"## 1. Global variables "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Original data download and description __[link](https://www.kaggle.com/c/PLAsTiCC-2018/data)__**."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"PATH = '../data'\n",
"#PATH = '../lsst/input'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Tested on V100 with 32 GB GPU memory. If memory capacity is smaller, the input data will be sampled accordingly.**"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"GPU_MEMORY = 32 # GB. \n",
"#GPU_MEMORY = 16 # GB. Both 32 and 16 GB have been tested"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"TEST_ROWS = 453653104 # number of rows in test data\n",
"# no skip if your gpu has 32 GB memory\n",
"# otherwise, skip rows porportionally\n",
"OVERHEAD = 1.17 # cudf 0.5 introduces 17% memory overhead\n",
"SKIP_ROWS = int((1 - GPU_MEMORY/(32.0*OVERHEAD))*TEST_ROWS) \n",
"GPU_RUN_TIME = {}\n",
"CPU_RUN_TIME = {}"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"GPU_id = 0\n",
"os.environ['CUDA_VISIBLE_DEVICES'] = str(GPU_id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"func\"></a>\n",
"## 2. Functions"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def scatter(x,y,values,xlabel='x',ylabel='y',title=None):\n",
" colors = ['b', 'g', 'r', 'c', 'm', 'y', 'k']\n",
" colors = np.array([colors[i] for i in values])\n",
" ps = []\n",
" bs = []\n",
" bands = ['passband_%s'%i for i in ['u', 'g', 'r', 'i', 'z','y']]\n",
" for i in sorted(np.unique(values)):\n",
" mask = values==i\n",
" if len(x[mask]):\n",
" p = plt.scatter(x[mask],y[mask],c=colors[mask])\n",
" ps.append(p)\n",
" bs.append(bands[i])\n",
" plt.legend(ps,bs,scatterpoints=1)\n",
" if title is not None:\n",
" plt.title(title)\n",
" \n",
" plt.xlim([np.min(x)-10,np.min(x)+1500])\n",
" plt.ylabel('y: %s'%ylabel)\n",
" plt.xlabel('x: %s'%xlabel)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"def multi_weighted_logloss(y_true, y_preds, classes, class_weights):\n",
" \"\"\"\n",
" refactor from\n",
" @author olivier https://www.kaggle.com/ogrellier\n",
" multi logloss for PLAsTiCC challenge\n",
" \"\"\"\n",
" y_p = y_preds.reshape(y_true.shape[0], len(classes), order='F')\n",
" y_ohe = pd.get_dummies(y_true)\n",
" y_p = np.clip(a=y_p, a_min=1e-15, a_max=1 - 1e-15)\n",
" y_p_log = np.log(y_p)\n",
" y_log_ones = np.sum(y_ohe.values * y_p_log, axis=0)\n",
" nb_pos = y_ohe.sum(axis=0).values.astype(float)\n",
" class_arr = np.array([class_weights[k] for k in sorted(class_weights.keys())])\n",
" y_w = y_log_ones * class_arr / nb_pos\n",
"\n",
" loss = - np.sum(y_w) / np.sum(class_arr)\n",
" return loss\n",
"\n",
"def xgb_multi_weighted_logloss(y_predicted, y_true, classes, class_weights):\n",
" loss = multi_weighted_logloss(y_true.get_label(), y_predicted, \n",
" classes, class_weights)\n",
" return 'wloss', loss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### CPU ETL functions "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"def ravel_column_names(cols):\n",
" d0 = cols.get_level_values(0)\n",
" d1 = cols.get_level_values(1)\n",
" return [\"%s_%s\"%(i,j) for i,j in zip(d0,d1)]\n",
" \n",
"def etl_cpu(df,df_meta):\n",
" df['flux_ratio_sq'] = np.power(df['flux'] / df['flux_err'], 2.0)\n",
" df['flux_by_flux_ratio_sq'] = df['flux'] * df['flux_ratio_sq']\n",
" aggs = {\n",
" 'passband': ['mean'], \n",
" 'flux': ['min', 'max', 'mean'],\n",
" 'flux_err': ['min', 'max', 'mean'],\n",
" 'detected': ['mean'],\n",
" 'mjd':['max','min'],\n",
" 'flux_ratio_sq':['sum'],\n",
" 'flux_by_flux_ratio_sq':['sum'],\n",
" }\n",
" agg_df = df.groupby('object_id').agg(aggs)\n",
" agg_df.columns = ravel_column_names(agg_df.columns)\n",
" \n",
" agg_df['flux_diff'] = agg_df['flux_max'] - agg_df['flux_min']\n",
" agg_df['flux_dif2'] = (agg_df['flux_max'] - agg_df['flux_min']) / agg_df['flux_mean']\n",
" agg_df['flux_w_mean'] = agg_df['flux_by_flux_ratio_sq_sum'] / agg_df['flux_ratio_sq_sum']\n",
" agg_df['flux_dif3'] = (agg_df['flux_max'] - agg_df['flux_min']) / agg_df['flux_w_mean']\n",
" \n",
" agg_df['mjd_diff'] = agg_df['mjd_max'] - agg_df['mjd_min']\n",
" agg_df = agg_df.drop(['mjd_max','mjd_min'],axis=1)\n",
" \n",
" agg_df = agg_df.reset_index()\n",
" df_meta = df_meta.drop(['ra','decl','gal_l','gal_b'],axis=1)\n",
" df_meta = df_meta.merge(agg_df,on='object_id',how='left')\n",
" return df_meta"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### GPU ETL functions "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# To save GPU memory, we drop the column as soon as it is done with groupby\n",
"# this hits performance a little but avoids GPU OOM.\n",
"def groupby_aggs(df,aggs,col):\n",
" res = None\n",
" for i,j in aggs.items():\n",
" for k in j:\n",
" #print(i,k)\n",
" tmp = df.groupby(col).agg({i:[k]})\n",
" if res is None:\n",
" res = tmp\n",
" else:\n",
" res = res.merge(tmp,on=[col],how='left')\n",
" df.drop_column(i)\n",
" return res\n",
"\n",
"def etl_gpu(df,df_meta):\n",
" aggs = {\n",
" 'passband': ['mean'], \n",
" 'detected': ['mean'],\n",
" 'mjd':['max','min'],\n",
" }\n",
" agg_df = groupby_aggs(df,aggs,'object_id')\n",
" # at this step, columns ['passband','detected','mjd'] are deleted \n",
" \n",
" df['flux_ratio_sq'] = df['flux'] / df['flux_err']\n",
" df['flux_ratio_sq'] = df['flux_ratio_sq'].applymap(lambda x: math.pow(x,2))\n",
" df['flux_by_flux_ratio_sq'] = df['flux'] * df['flux_ratio_sq']\n",
" \n",
" aggs2 = {\n",
" 'flux_ratio_sq':['sum'],\n",
" 'flux_by_flux_ratio_sq':['sum'],\n",
" 'flux': ['min', 'max', 'mean'],\n",
" 'flux_err': ['min', 'max', 'mean'],\n",
" }\n",
" agg_df2 = groupby_aggs(df,aggs2,'object_id')\n",
" agg_df = agg_df.merge(agg_df2,on=['object_id'],how='left')\n",
" del agg_df2\n",
"\n",
" agg_df['flux_diff'] = agg_df['max_flux'] - agg_df['min_flux']\n",
" agg_df['flux_dif2'] = (agg_df['max_flux'] - agg_df['min_flux']) / agg_df['mean_flux']\n",
" agg_df['flux_w_mean'] = agg_df['sum_flux_by_flux_ratio_sq'] / agg_df['sum_flux_ratio_sq']\n",
" agg_df['flux_dif3'] = (agg_df['max_flux'] - agg_df['min_flux']) / agg_df['flux_w_mean']\n",
" \n",
" agg_df['mjd_diff'] = agg_df['max_mjd'] - agg_df['min_mjd']\n",
" agg_df.drop_column('max_mjd')\n",
" agg_df.drop_column('min_mjd')\n",
" \n",
" for col in ['ra','decl','gal_l','gal_b']:\n",
" df_meta.drop_column(col)\n",
" \n",
" df_meta = df_meta.merge(agg_df,on=['object_id'],how='left')\n",
" return df_meta"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"etl\"></a>\n",
"## 3. ETL & Visualizations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load data for ETL part 1\n",
"**GPU load data**"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 14.7 s, sys: 5.96 s, total: 20.6 s\n",
"Wall time: 21.2 s\n"
]
}
],
"source": [
"%%time\n",
"start = time.time()\n",
"step = 'load data part1'\n",
"ts_cols = ['object_id', 'mjd', 'passband', 'flux', 'flux_err', 'detected']\n",
"ts_dtypes = ['int32', 'float32', 'int32', 'float32','float32','int32']\n",
"\n",
"train_gd = gd.read_csv('%s/training_set.csv'%PATH,\n",
" names=ts_cols,dtype=ts_dtypes,skiprows=1)\n",
"test_gd = gd.read_csv('%s/test_set.csv'%PATH,\n",
" names=ts_cols,dtype=ts_dtypes,skiprows=1+SKIP_ROWS) # skip the header\n",
"GPU_RUN_TIME[step] = time.time() - start"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**CPU load data**"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 3min 59s, sys: 54.8 s, total: 4min 54s\n",
"Wall time: 3min 49s\n"
]
}
],
"source": [
"%%time\n",
"start = time.time()\n",
"train = pd.read_csv('%s/training_set.csv'%PATH)\n",
"test = pd.read_csv('%s/test_set.csv'%PATH,skiprows=range(1,1+SKIP_ROWS))\n",
"CPU_RUN_TIME[step] = time.time() - start"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[32mwe achieve 10.829 speedup for load data part1.\u001b[0m\n"
]
}
],
"source": [
"speedup = CPU_RUN_TIME[step]/GPU_RUN_TIME[step]\n",
"line = \"we achieve %.3f speedup for %s.\"%(speedup,step)\n",
"print(colored(line,'green'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Visualizations"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"oid = 615\n",
"mask = train.object_id== oid\n",
"scatter(train.loc[mask,'mjd'].values,\n",
" train.loc[mask,'flux'].values,\n",
" values=train.loc[mask,'passband'].values,\n",
" xlabel='time',ylabel='flux',title='object %d class 42'%oid)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ETL part 1 with 100x speedup"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 8 ms, sys: 12 ms, total: 20 ms\n",
"Wall time: 34.9 ms\n"
]
}
],
"source": [
"%%time\n",
"# to save memory, we need to move dataframe to cpu and only keep the columns we need\n",
"test_gd = test_gd[['object_id','flux']]\n",
"train_gd = train_gd[['object_id','flux']]"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 3.8 s, sys: 936 ms, total: 4.73 s\n",
"Wall time: 5.41 s\n"
]
}
],
"source": [
"%%time\n",
"# GPU\n",
"step = 'ETL part1'\n",
"start = time.time()\n",
"aggs = {'flux':['skew']}\n",
"test_gd = cudf_groupby_aggs(test_gd,group_id_col='object_id',aggs=aggs)\n",
"train_gd = cudf_groupby_aggs(train_gd,group_id_col='object_id',aggs=aggs)\n",
"GPU_RUN_TIME[step] = time.time() - start"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 12min 50s, sys: 35.8 s, total: 13min 26s\n",
"Wall time: 13min 10s\n"
]
}
],
"source": [
"%%time\n",
"# CPU\n",
"start = time.time()\n",
"test = test.groupby('object_id').agg(aggs)\n",
"train = train.groupby('object_id').agg(aggs)\n",
"CPU_RUN_TIME[step] = time.time() - start"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[32mwe achieve 146.110 speedup for ETL part1.\u001b[0m\n"
]
}
],
"source": [
"speedup = CPU_RUN_TIME[step]/GPU_RUN_TIME[step]\n",
"line = \"we achieve %.3f speedup for %s.\"%(speedup,step)\n",
"print(colored(line,'green'))"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 96 ms, sys: 16 ms, total: 112 ms\n",
"Wall time: 114 ms\n"
]
}
],
"source": [
"%%time\n",
"test_gd = test_gd.sort_values(by='object_id')\n",
"train_gd = train_gd.sort_values(by='object_id')"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 6.05 s, sys: 200 ms, total: 6.25 s\n",
"Wall time: 157 ms\n"
]
}
],
"source": [
"%%time\n",
"test.columns = ['skew_flux']\n",
"test = test.reset_index()\n",
"test = test.sort_values(by='object_id')\n",
"train.columns = ['skew_flux']\n",
"train = train.reset_index()\n",
"train = train.sort_values(by='object_id')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Evaluation of correctness of ETL**"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3029740 3029740\n"
]
}
],
"source": [
"print(len(test),len(test_gd))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"test\n",
"object_id, rmse 0.000000\n",
"skew_flux, rmse 0.000002\n",
"train\n",
"object_id, rmse 0.000000\n",
"skew_flux, rmse 0.000006\n"
]
}
],
"source": [
"# RMSE: Root mean square error\n",
"def rmse(a,b):\n",
" return np.mean((a-b)**2)**0.5\n",
"print('test')\n",
"for col in test.columns:\n",
" if col in test_gd.columns:\n",
" print(\"%s, rmse %.6f\"%(col,rmse(test[col].values,test_gd[col].to_pandas().values)))\n",
"print('train')\n",
"for col in train.columns:\n",
" if col in train_gd.columns:\n",
" print(\"%s, rmse %.6f\"%(col,rmse(train[col].values,train_gd[col].to_pandas().values)))"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3029740 3029740\n"
]
}
],
"source": [
"# Rename the variables\n",
"test_flux_skew_gd = test_gd\n",
"test_flux_skew = test\n",
"train_flux_skew_gd = train_gd\n",
"train_flux_skew = train\n",
"print(len(test_gd),len(test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load data for the ETL part 2 with 11x speedup"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 20 s, sys: 5.26 s, total: 25.3 s\n",
"Wall time: 19.6 s\n"
]
}
],
"source": [
"%%time\n",
"# read data on gpu\n",
"step = 'load data part2'\n",
"start = time.time()\n",
"ts_cols = ['object_id', 'mjd', 'passband', 'flux', 'flux_err', 'detected']\n",
"ts_dtypes = ['int32', 'float32', 'int32', 'float32','float32','int32']\n",
"\n",
"test_gd = gd.read_csv('%s/test_set.csv'%PATH,\n",
" names=ts_cols,dtype=ts_dtypes,skiprows=1+SKIP_ROWS) # skip the header\n",
"train_gd = gd.read_csv('%s/training_set.csv'%PATH,\n",
" names=ts_cols,dtype=ts_dtypes,skiprows=1)\n",
"\n",
"cols = ['object_id', 'ra', 'decl', 'gal_l', 'gal_b', 'ddf',\n",
" 'hostgal_specz', 'hostgal_photoz', 'hostgal_photoz_err', \n",
" 'distmod','mwebv', 'target']\n",
"dtypes = ['int32']+['float32']*4+['int32']+['float32']*5+['int32']\n",
"\n",
"train_meta_gd = gd.read_csv('%s/training_set_metadata.csv'%PATH,\n",
" names=cols,dtype=dtypes,skiprows=1)\n",
"del cols[-1],dtypes[-1]\n",
"test_meta_gd = gd.read_csv('%s/test_set_metadata.csv'%PATH,\n",
" names=cols,dtype=dtypes,skiprows=1)\n",
"GPU_RUN_TIME[step] = time.time() - start"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 4min, sys: 1min 3s, total: 5min 4s\n",
"Wall time: 3min 58s\n"
]
}
],
"source": [
"%%time\n",
"# read data on cpu\n",
"start = time.time()\n",
"test = pd.read_csv('%s/test_set.csv'%PATH,skiprows=range(1,1+SKIP_ROWS))\n",
"test_meta = pd.read_csv('%s/test_set_metadata.csv'%PATH)\n",
"\n",
"train = pd.read_csv('%s/training_set.csv'%PATH)\n",
"train_meta = pd.read_csv('%s/training_set_metadata.csv'%PATH)\n",
"CPU_RUN_TIME[step] = time.time() - start"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[32mwe achieve 12.143 speedup for load data part2.\u001b[0m\n"
]
}
],
"source": [
"speedup = CPU_RUN_TIME[step]/GPU_RUN_TIME[step]\n",
"line = \"we achieve %.3f speedup for %s.\"%(speedup,step)\n",
"print(colored(line,'green'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ETL part2 with 9x ~ 12x speedup "
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 9.66 s, sys: 2.2 s, total: 11.9 s\n",
"Wall time: 9.56 s\n"
]
}
],
"source": [
"%%time\n",
"# GPU\n",
"start = time.time()\n",
"step = 'ETL part2'\n",
"train_final_gd = etl_gpu(train_gd,train_meta_gd)\n",
"train_final_gd = train_final_gd.merge(train_flux_skew_gd,on=['object_id'],how='left')\n",
"test_final_gd = etl_gpu(test_gd,test_meta_gd)\n",
"del test_gd,test_meta_gd\n",
"test_final_gd = test_final_gd.merge(test_flux_skew_gd,on=['object_id'],how='left')\n",
"GPU_RUN_TIME[step] = time.time() - start"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 4min 42s, sys: 2min 21s, total: 7min 4s\n",
"Wall time: 2min 20s\n"
]
}
],
"source": [
"%%time\n",
"#CPU\n",
"start = time.time()\n",
"train_final = etl_cpu(train,train_meta)\n",
"train_final = train_final.merge(train_flux_skew,on=['object_id'],how='left')\n",
"test_final = etl_cpu(test,test_meta)\n",
"test_final = test_final.merge(test_flux_skew,on=['object_id'],how='left')\n",
"CPU_RUN_TIME[step] = time.time() - start"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[32mwe achieve 14.712 speedup for ETL part2.\u001b[0m\n"
]
}
],
"source": [
"speedup = CPU_RUN_TIME[step]/GPU_RUN_TIME[step]\n",
"line = \"we achieve %.3f speedup for %s.\"%(speedup,step)\n",
"print(colored(line,'green'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"train\"></a>\n",
"## 4. Model training"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### train and validation with 5x speedup"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 6 15 16 42 52 53 62 64 65 67 88 90 92 95]\n"
]
}
],
"source": [
"# CPU\n",
"X = train_final.drop(['object_id','target'],axis=1).values\n",
"y = train_final['target']\n",
"Xt = test_final.drop(['object_id'],axis=1).values\n",
"assert X.shape[1] == Xt.shape[1]\n",
"classes = sorted(y.unique()) \n",
"# Taken from Giba's topic : https://www.kaggle.com/titericz\n",
"# https://www.kaggle.com/c/PLAsTiCC-2018/discussion/67194\n",
"# with Kyle Boone's post https://www.kaggle.com/kyleboone\n",
"class_weights = {c: 1 for c in classes}\n",
"class_weights.update({c:2 for c in [64, 15]})\n",
"\n",
"lbl = LabelEncoder()\n",
"y = lbl.fit_transform(y)\n",
"print(lbl.classes_)\n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.1,stratify=y, random_state=126)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"cpu_params = {\n",
" 'objective': 'multi:softprob', \n",
" 'tree_method': 'hist', \n",
" 'nthread': 16, \n",
" 'num_class':14,\n",
" 'max_depth': 7, \n",
" 'silent':1,\n",
" 'subsample':0.7,\n",
" 'colsample_bytree': 0.7,}"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"func_loss = partial(xgb_multi_weighted_logloss, \n",
" classes=classes, \n",
" class_weights=class_weights)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[06:09:45] Tree method is selected to be 'hist', which uses a single updater grow_fast_histmaker.\n",
"[0]\teval-merror:0.356688\ttrain-merror:0.276936\teval-wloss:2.04549\ttrain-wloss:1.8956\n",
"Multiple eval metrics have been passed: 'train-wloss' will be used for early stopping.\n",
"\n",
"Will train until train-wloss hasn't improved in 10 rounds.\n",
"[59]\teval-merror:0.284076\ttrain-merror:0.000991\teval-wloss:1.21854\ttrain-wloss:0.091642\n",
"\u001b[32mvalidation loss 1.2185\u001b[0m\n",
"CPU times: user 6min 38s, sys: 3.27 s, total: 6min 41s\n",
"Wall time: 27.1 s\n"
]
}
],
"source": [
"%%time\n",
"start = time.time()\n",
"step = 'training'\n",
"dtrain = xgb.DMatrix(data=X_train, label=y_train)\n",
"dvalid = xgb.DMatrix(data=X_test, label=y_test)\n",
"dtest = xgb.DMatrix(data=Xt)\n",
"watchlist = [(dvalid, 'eval'), (dtrain, 'train')]\n",
"clf = xgb.train(cpu_params, dtrain=dtrain,\n",
" num_boost_round=60,evals=watchlist,\n",
" feval=func_loss,early_stopping_rounds=10,\n",
" verbose_eval=1000)\n",
"yp = clf.predict(dvalid)\n",
"cpu_loss = multi_weighted_logloss(y_test, yp, classes, class_weights)\n",
"ysub = clf.predict(dtest)\n",
"line = 'validation loss %.4f'%cpu_loss\n",
"print(colored(line,'green'))\n",
"CPU_RUN_TIME[step] = time.time() - start"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"# GPU\n",
"y = train_final_gd['target'].to_array()\n",
"y = lbl.fit_transform(y)\n",
"cols = [i for i in test_final_gd.columns if i not in ['object_id','target']]\n",
"for col in cols:\n",
" train_final_gd[col] = train_final_gd[col].fillna(0).astype('float32')\n",
"\n",
"for col in cols:\n",
" test_final_gd[col] = test_final_gd[col].fillna(0).astype('float32')"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"X = train_final_gd[cols].as_matrix()\n",
"Xt = test_final_gd[cols].as_matrix()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.1,stratify=y, random_state=126)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"# GPU\n",
"gpu_params = cpu_params.copy()\n",
"gpu_params.update({'objective': 'multi:softprob',\n",
" 'tree_method': 'gpu_hist', \n",
" })"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0]\teval-merror:0.340127\ttrain-merror:0.296333\teval-wloss:2.01898\ttrain-wloss:1.89141\n",
"Multiple eval metrics have been passed: 'train-wloss' will be used for early stopping.\n",
"\n",
"Will train until train-wloss hasn't improved in 10 rounds.\n",
"[59]\teval-merror:0.280255\ttrain-merror:0.002407\teval-wloss:1.24705\ttrain-wloss:0.099147\n",
"\u001b[32mvalidation loss 1.2471\u001b[0m\n",
"CPU times: user 1min 21s, sys: 1.96 s, total: 1min 23s\n",
"Wall time: 6.75 s\n"
]
}
],
"source": [
"%%time\n",
"start = time.time()\n",
"dtrain = xgb.DMatrix(data=X_train, label=y_train)\n",
"dvalid = xgb.DMatrix(data=X_test, label=y_test)\n",
"dtest = xgb.DMatrix(data=Xt)\n",
"watchlist = [(dvalid, 'eval'), (dtrain, 'train')]\n",
"clf = xgb.train(gpu_params, dtrain=dtrain,\n",
" num_boost_round=60,evals=watchlist,\n",
" feval=func_loss,early_stopping_rounds=10,\n",
" verbose_eval=1000)\n",
"yp = clf.predict(dvalid)\n",
"gpu_loss = multi_weighted_logloss(y_test, yp, classes, class_weights)\n",
"ysub = clf.predict(dtest)\n",
"line = 'validation loss %.4f'%gpu_loss\n",
"print(colored(line,'green'))\n",
"GPU_RUN_TIME[step] = time.time() - start"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[32mwe achieve 4.022 speedup for training.\u001b[0m\n"
]
}
],
"source": [
"speedup = CPU_RUN_TIME[step]/GPU_RUN_TIME[step]\n",
"line = \"we achieve %.3f speedup for %s.\"%(speedup,step)\n",
"print(colored(line,'green'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"conclusions\"></a>\n",
"## 5. Conclustions"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Multiclassification Loss (lower the better):\n",
"CPU: 1.2185 GPU: 1.2471\n"
]
}
],
"source": [
"print(\"Multiclassification Loss (lower the better):\")\n",
"print(\"CPU: %.4f GPU: %.4f\"%(cpu_loss,gpu_loss))"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'load data part1': 229.86804676055908,\n",
" 'ETL part1': 790.7703940868378,\n",
" 'load data part2': 238.17845582962036,\n",
" 'ETL part2': 140.68488955497742,\n",
" 'training': 27.127089023590088}"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"CPU_RUN_TIME"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'load data part1': 21.226726055145264,\n",
" 'ETL part1': 5.412163972854614,\n",
" 'load data part2': 19.614619255065918,\n",
" 'ETL part2': 9.562284231185913,\n",
" 'training': 6.745010137557983}"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"GPU_RUN_TIME"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f854ff70668>"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1440x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"steps = ['load data part1','ETL part1','load data part2','ETL part2','training']\n",
"GPU_RUN_TIME['Overall'] = sum([GPU_RUN_TIME[i] for i in steps])\n",
"CPU_RUN_TIME['Overall'] = sum([CPU_RUN_TIME[i] for i in steps])\n",
"steps.append('Overall')\n",
"speedup = [CPU_RUN_TIME[i]/GPU_RUN_TIME[i] for i in steps]\n",
"df = pd.DataFrame({'steps':steps, 'speedup':speedup})\n",
"df.plot.bar(x='steps', y='speedup', rot=0, figsize=(20,5), fontsize=15, title='GPU Speedup')"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f854ff5fb00>"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABKAAAAFGCAYAAABdfOeEAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAAIABJREFUeJzs3XuYHGWZ9/HvTIYEYhIYwygiEYiRG8HDrsoqeADPwBpxOYi6oiyIgiKuoAICmqAgQRHd4C6r6IKuiqh4iBIRRFEEd2Fd912F3KDIaUGMOCHkQMJk5v2jqqFpZjI9sWt6Jv39XNdck656qvruzlM1M79+6qmuoaEhJEmSJEmSpKp0t7sASZIkSZIkbd4MoCRJkiRJklQpAyhJkiRJkiRVygBKkiRJkiRJlTKAkiRJkiRJUqUMoCRJkiRJklQpAyhJktRxImJpRLx1nJ7rgxFxwXg812QSET+JiLe1uw5JkjQ+etpdgCRJ6kwRcRvwRGADsAr4AXBsZq6q+rkzc78q9hsR+wD/npk71D3XmVU8lyRJ0mTiCChJktRO8zNzBvBXwF8DJ7e5HkmSJFXAEVCSJKntMvMPEXE5RRAFFJdoUYwmuqB8fDjwtsx8Ufl4CDgGOAHoA75MMYJqqNYW+AVwJLACeGdmLm3cdxNtdwYuogjI/gNIYOvMfHP9a4iIxwFLgWkRURvFtQvwdmBeZr45InYCfg8cAZwOzKAI3f4L+DzwlLKuY+v2ewTwfmA74D+Bt2fm7aO9pxGxJXABsB8wBbgFeE1m3hsRWwOfBPYHBoF/Az6cmRvKbY8Cjgd2AO4E3pyZv4yIpwP/QvH/9H/AyZn53XKbC4HVwE7AS4AbgTdl5u/K9a8EFgNPAr4EdNXVOq98/X8FPAT8KDMPHe01SpKkycMRUJIkqe0iYgeKoOS3Y9z0NcAewLOA1wOvrlv3fIqwaFvgbODzEdH1mD2M3vYrFMHPbGABcNhwO8jM1eVruDszZ5Rfd2/k+Z4GHAp8CjgFeAWwO/D6iNgbICIOAD4IHEgRsv0M+GptJxHxvYg4aYTneCuwNTCnrP1oYG257kJgAJhHEay9iiKEIyIOKV/nW4BZwGuB+yJiC2AJ8EPgCcC7gS9HRNQ95xuAhUAvxf/lGeU+twUuBU6leI9/B7ywbruPlPvtpQi9Fo/wmiRJ0iTlCChJktRO3y5HMs0ArgI+PMbtz8rMFcCKiPgxxQiaH5Trbs/MzwFExEXAP1PMOfWHYfYzbNuImEoRcL08M9cD10TEd8dY43A+kpkPAj+MiNXAVzPzj+Xz/4wiFLqaIjT6WGbeVK47E/hgROyYmbdn5ms28hwPUQRP8zLz/1GMsiIinkgx8mmbzFwLrI6IcylGav0rRRB1dmZeX+7nt+V2L6b4fzorMweBqyLie8AbKQIrgG9l5n+W7b9MMcqK8vl+k5nfKNd9imLkWn2tOwLbZ+ZdwDXNvpGSJGlycASUJElqp9dl5kxgH2BXitExY1EfJq2hCEgesy4z15T/rF8/7H4a2m4P/LluGRSXpP2l7q3799phHtfq3BH4dESsiIgVwJ8pLl17chPP8SXgcuDiiLg7Is4uRzHtCGwB3FO333+lGNUExYip3w2zv+2BO8vwqeb2hlpG+v/Ynrr3LTOHePT7+IHydf1nRPymvOxQkiRtRhwBJUmS2i4zry7nEPoE8Lpy8Wpgel2z7ca7LuAe4PERMb0uhJqzkfZDLX7+O4EzMvPLY90wMx+iuBxuYTn31GUUlxleBqwDts3MgRGe86nDLL8bmBMR3XUh1FOAm5so5x7q3rfy8saHH2fmH4CjynUvAq6MiJ9m5lgvyZQkSROUI6AkSdJE8SnglRHx7PLxr4ADI2J6OUn1keNdUDnZ9w3AgoiYGhF7AvM3ssm9wOxyku9WOB84OSJ2B4iIrcs5mkYVES+NiGdGxBRgJcVlboOZeQ/FfEvnRMSsiOiOiKfW5p2imLj8fRHx3Ijoioh5EbEjxQTsa4APRMQWEbEPxXtxcRPlfB/YPSIOjIge4DjqAsWIOKScBwygnyLIG3zsbiRJ0mRlACVJkiaEzFwOfBH4ULnoXGA9RahzEcVd7trh74E9gfuAjwJfoxhB9BiZuYxikvBby8vbtv9LnjgzvwUsoriMbiXwa4qJzgGIiKUR8cERNt8O+AZF+HQTxZxSXyrXvQWYSnGnuv6y3ZPK5/w6xeThXwEeAL4NPL6cA2t++fx/opgn6y3lax7tdfwJOAQ4i+J9fBrw87omewD/Ud498LvAezLz1tH2K0mSJo+uoaFWjxSXJEnafEXE14BlmTnWCdMlSZI6lnNASZIkbURE7EEx+ffvgVcBB1CM5JEkSVKTDKAkSZI2bjvgUmA2cBdwTGb+d3tLkiRJmly8BE+SJEmSJEmVchJySZIkSZIkVWpzvgRvGsUdVe4BNrS5FkmSJEmSpM3BFIq7517PCHcGHs7mHEDtAfys3UVIkiRJkiRthl4MXNNs4805gLoHoL9/NYODznM1XmbPnsF9961qdxlSpezn6gT2c3UC+7k6gf1cncB+Pr66u7vo7X0clLlLszbnAGoDwODgkAHUOPP9Viewn6sT2M/VCezn6gT2c3UC+3lbjGm6IychlyRJkiRJUqUMoCRJkiRJklSpzfkSvGENDQ2xatX9rF27isHBzrk5Xk/PVHp7+5gypeP+yyVJkiRJUpt1XBrR37+crq4uHv/4JzJlSg9dXV3tLqlyQ0NDrF69kv7+5Wy77ZPaXY4kSZIkSeowHXcJ3vr1D7LNNrPp6dmiI8IngK6uLh73uFkMDKxvdymSJEmSJKkDdVwABUN0dXXey+6UsE2SJEmSJE08nZfESJIkSZIkaVx13BxQw5k5ayu2nNb6t+LBdQM8sHLtqO0GBga46KLPc+WVlzNlSg9Tpkxhzpw5HHnk0dx002/4p386h+22256BgYfYccedOPHEU5k1a2sOPng+Z599LnPnznt4X0ceeRjvetd7eM5zntfy1yNJkiRJkrQpDKCALaf1MP+E77R8v0vOOYAHmmh35pkLefDBB/nsZy9i5syZDA0Ncd11P+eOO24H4HnP+xs++tGzGRwc5EMfOomLLvo873738S2vV5IkSZIkqQpNBVARMQ94P7AnsDvws8zcZyPtzwX+ETgnM9/XsG43YHG5rxXABcDCzNxQ16YLOBk4BtgWuB44LjN/1fQrmyTuvPMOfvrTH3PppZcxc+ZMoJivaa+9XgTAZZctebhtd3c3z3nOHlx33TVtqVWSJEmSpIlmcGA9fX0z213GmA2sX0f//Z1zs7BmR0DtDuwP/ALYYmMNy4DpSGDlMOt6gSuBG4EDgKcC51DMRXVqXdOTgNMoQq9lwPHAlRHxjMz8Q5M1Two335zssMNTmDVr1qht169fzzXX/JRdd336OFQmSZIkSdLE190zlVvPOKjdZYzZ3FO+CRhANVqSmd8BiIhvUIxKGsli4NPAYcOsOxrYCjgwM1cCV0TELGBBRJydmSsjYkuKAOpjmXle+ZzXAbcBx/LooGqz8/vf38rChafy4IMP8oIX7MUuuwQ33PCfHH74mwB45jOfzWGH/QMw8p3tvOOdJEmSJEmaSJq6C15mDjbTLiIOBnYFzhqhyX7A5WX4VHMxRSi1d/l4L2AWcEnd868GlpTbb1Z22SW46647eOCBYraonXeey4UXfoVDDjmU1atXAcUcUBde+BUuvPArnHDCiWy11VYAbLPNNtx///2P2t/996+gt/fx4/siJEmSJEmSNqKpAKoZEbEVxeV0J5WB0XB2pbik7mGZeQewplxXa7MBuKVh25vq2mw25sx5Ci960d4sWvRRVq1a9fDytWtHv3ve8573fL73vW+zYUMxfdZ1111Dd3c3O+wwp7J6JUmSJEmSxqqVd8E7GbgH+PeNtOmlmHi8UX+5rtZmVf2k5HVtpkfE1Mxs+iLJ2bNnPOrxH//YTU/Po3O3B9cNsOScA5rdZdMeXDfwmOcazoc/fDpf+MLnOOqot9DT08PMmbPo6+vjsMMO57e/vYWurq5h93PkkW9j8eJPccQRf093dzezZs1i0aJz2HLLqcM+T3d397hMzDYZJ3+Txsp+rk5gP1cnsJ+rE9jPpYmrk47PlgRQEbEz8D7gpZk51Ip9tsp9961icPCRkgYHBxkYePQVhQ+sXMsD411Yna6uKRx55NEceeTRj1k3b16w776veUzNAD0903jve098zPLh2kLx2pcvr/aV9vXNrPw5pHazn6sT2M/VCezn6gT2c3WCyRziTMbjs7u76zGDfZrarkXPfxawFMiI2CYitin3Pa18XJsVux/Yepjte8t1tTYzImLKMG3WjGX0kyRJkiRJktqvVQFUAAdShEe1rzkUd63rB55ctltGwzxOETEHmM4jc0MtA6YA8xqe4zHzR0mSJEmSJGnia1UA9TbgpQ1f91Lcye6lwPKy3VLg1RFRPz7uUGAtcHX5+FpgJXBIrUFETAfml9tLkiRJkiRpEmlqDqgyANq/fPhkYFZEHFw+viwzbxhmmweBOzPzJ3WLzweOAy6NiEXAXGAB8MnMXAmQmQ9GxFnAaRHRTzHq6XiKsGzx2F6eJEmSJEmS2q3ZScifAHy9YVnt8c7Abc3sJDP7I+LlwHnAEoo74p1LEULVO4sicDoZmA3cALwyM+9tsl5JkiRJkiRNEE0FUJl5G9A1WruGbXYaYfmNwMtG2XYIOKP8kiRJkiRJ0iTW7AiozVrv1lPpmTqt5fsdWL+O/vu9aZ8kSZIkSepsBlBAz9Rp3HrGQS3f79xTvgmMHkANDAxw4YUXcOWVP2TatKl0d3fznOfswQtesCcnnXQCc+bsyIYNA8yevS0nnngqT3rS9hx77Nt54xsP44UvfPHD+zn11A+w114vZv/957f8tUiSJEmSJG0qA6gJ4MwzF7Ju3YN84QtfYvr0xzEwMMD3v/9d1q9/iJ12msvnP/8lABYv/iSLF5/LmWd+vM0VS5IkSZIkNa+73QV0ujvvvIOf/vTHnHjiaUyf/jgAenp6OOCAA9lqq60e1fZ5z/sb7rjj9naUKUmSJEmStMkMoNrs5puTHXZ4CrNmzdpou8HBQX7yk6vYZZcYp8okSZIkSZJaw0vwJrjbbruVww9/E0NDQ8ybN493v/u9AHR1DX9TwpGWS5IkSZIktYsBVJvtsktw1113sHLlymFHQdXPAVVvm216Wbny/kctW7FiBdts01tZrZIkSZIkSZvCS/DabM6cp/DCF76Ej3/8TNasWQ3Ahg0bWLLk26xdu3bE7fbY4/n84AffZ926dQDccsvN3H77bey22+7jUrckSZIkSVKzHAEFDKxfx9xTvlnJfptx6qkL+cIXPssRRxzGFlv0MDQ0xAte8EK22267Ebd5zWsO4N57/8BRR72F7u4pTJs2jYULz2TrrbdpVfmSJEmSJEktYQAF9N+/HljftuffYosteMc73sU73vGux6zbY48XDLtNd3c3Rx11DEcddUzV5UmSJEmSJP1FvARPkiRJkiRJlTKAkiRJkiRJUqU6MIDqYmhosN1FjLuhoaF2lyBJkiRJkjpUxwVQU6duyYoVf2Jg4KGOCWWGhoZYvXolPT1T212KJEmSJEnqQB03CXlvbx+rVt3Pn/98L4ODG9pdzrjp6ZlKb29fu8uQJEmSJEkdqOMCqK6uLmbO3IaZM7dpdymSJEmSJEkdoeMuwZMkSZIkSdL4MoCSJEmSJElSpQygJEmSJEmSVCkDKEmSJEmSJFXKAEqSJEmSJEmVMoCSJEmSJElSpQygJEmSJEmSVKmeZhpFxDzg/cCewO7AzzJzn7r1TwKOB14FPBXoB64CTs7Muxv29WTgPOAVwDrgYuADmbmmod1RwAeAOcBvyjY/GvtLlCRJkiRJUjs1OwJqd2B/IIGbh1n/XODvgK8C8ynCqucD10bEjFqjiNgCuBzYEXgD8B7gEOCz9TuLiDcC5wNfBPajCKC+FxHPaPaFSZIkSZIkaWJoagQUsCQzvwMQEd8Atm1Yfw2wa2YO1BZExC8pAquDgIvKxQcDTwfmZebvy3YPARdHxMLMvKVstwC4KDM/Ura5Gvhr4CTgzWN6hZIkSZIkSWqrpkZAZebgKOtX1IdP5bKbgTXA9nWL9wOur4VPpW8D64F9ASJiLrALcEnD83+93F6SJEmSJEmTSGWTkEfEs4DpPPqSvV2BZfXtMnM98LtyHXXfH9UOuAl4fET0tb5aSZIkSZIkVaWSACoiuoFPA7cA361b1QusGGaT/nIddd8b2/U3rJckSZIkSdIk0OwcUGP1MYo75u2dmQ9V9BxNmT17xuiN1FJ9fTPbXYJUOfu5OoH9XJ3Afq5OYD+XJq5OOj5bHkBFxDsp7oL3xsz8j4bV/cDWw2zWC/xPXRvKdisa2tSvb8p9961icHBoLJvoL9DXN5Plyx9odxlSpezn6gT2c3UC+7k6gf1cnWAyhziT8fjs7u7apME+Lb0ELyIOAhYDH8jMrw3TZBmPzPFU22YqMJdH5nyqfX9Uu/LxnzNzeesqliRJkiRJUtVaFkBFxD7Al4HFmfmJEZotBfaIiB3rlr0WmAb8ACAzb6WYuPyQun13l4+XtqpeSZIkSZIkjY+mLsGLiOnA/uXDJwOzIuLg8vFlwI7AtylGL30tIl5Qt/nyzPxd+e9vAKcAl0bEaRSX2Z0LfCUzb6nbZgHw7xFxG/Bz4K3A04A3jeXFSZIkSZIkqf2anQPqCcDXG5bVHu8MPJ8iTHo2cG1Du4uAwwEy86GI2Bc4D7gEWAdcTDFn1MMy86sRMQM4ETgN+A3wmsz8dZP1SpIkSZIkaYJoKoDKzNuAro00ubD8amZfdwGva6Ld54DPNbNPSZIkSZIkTVwtnYRckiRJkiRJamQAJUmSJEmSpEoZQEmSJEmSJKlSBlCSJEmSJEmqlAGUJEmSJEmSKmUAJUmSJEmSpEoZQEmSJEmSJKlSBlCSJEmSJEmqlAGUJEmSJEmSKmUAJUmSJEmSpEoZQEmSJEmSJKlSBlCSJEmSJEmqlAGUJEmSJEmSKmUAJUmSJEmSpEoZQEmSJEmSJKlSBlCSJEmSJEmqlAGUJEmSJEmSKmUAJUmSJEmSpEoZQEmSJEmSJKlSBlCSJEmSJEmqlAGUJEmSJEmSKmUAJUmSJEmSpEoZQEmSJEmSJKlSBlCSJEmSJEmqVE8zjSJiHvB+YE9gd+BnmblPQ5su4GTgGGBb4HrguMz8VUO73YDF5b5WABcACzNzw1j3JUmSJEmSpImv2RFQuwP7AwncPEKbk4DTgEXAfGAVcGVEbFdrEBG9wJXAEHAAcDpwArBwrPuSJEmSJEnS5NBsALUkM+dk5iHAbxpXRsSWFKHRxzLzvMy8EjiEImg6tq7p0cBWwIGZeUVmnk8RPh0fEbPGuC9JkiRJkiRNAk0FUJk5OEqTvYBZwCV126wGlgD71bXbD7g8M1fWLbuYIpTae4z7kiRJkiRJ0iTQqknIdwU2ALc0LL+pXFffbll9g8y8A1hT167ZfUmSJEmSJGkSaGoS8ib0AqvqJxIv9QPTI2JqZq4v260YZvv+ct1Y9tWU2bNnNNtULdLXN7PdJUiVs5+rE9jP1Qns5+oE9nNp4uqk47NVAdSEdd99qxgcHGp3GR2jr28my5c/0O4ypErZz9UJ7OfqBPZzdQL7uTrBZA5xJuPx2d3dtUmDfVp1CV4/MCMipjQs7wXW1I1Y6ge2Hmb73nLdWPYlSZIkSZKkSaBVAdQyYAowr2F545xPy2iYxyki5gDT69o1uy9JkiRJkiRNAq0KoK4FVgKH1BZExHRgPrC0rt1S4NURUT8+7lBgLXD1GPclSZIkSZKkSaCpOaDKAGj/8uGTgVkRcXD5+LLMXBMRZwGnRUQ/xUil4ykCrsV1uzofOA64NCIWAXOBBcAnM3MlQGY+2OS+JEmSJEmSNAk0Own5E4CvNyyrPd4ZuA04iyIkOhmYDdwAvDIz761tkJn9EfFy4DxgCcUd8c6lCKHqjbovSZIkSZIkTQ5NBVCZeRvQNUqbIeCM8mtj7W4EXtaKfUmSJEmSJGnia9UcUJIkSZIkSdKwDKAkSZIkSZJUKQMoSZIkSZIkVcoASpIkSZIkSZUygJIkSZIkSVKlDKAkSZIkSZJUKQMoSZIkSZIkVcoASpIkSZIkSZUygJIkSZIkSVKlDKAkSZIkSZJUKQMoSZIkSZIkVcoASpIkSZIkSZUygJIkSZIkSVKlDKAkSZIkSZJUKQMoSZIkSZIkVcoASpIkSZIkSZUygJIkSZIkSVKlDKAkSZIkSZJUKQMoSZIkSZIkVcoASpIkSZIkSZUygJIkSZIkSVKlDKAkSZIkSZJUKQMoSZIkSZIkVaqnlTuLiDcAHwB2Ae4HfgSclJl317XpAk4GjgG2Ba4HjsvMXzXsazdgMbAnsAK4AFiYmRtaWbMkSZIkSZKq1bIRUBHxWuCrwLXAAcCJwEuA70dE/fOcBJwGLALmA6uAKyNiu7p99QJXAkPlvk4HTgAWtqpeSZIkSZIkjY9WjoB6E/DLzDy2tiAiVgLfAQK4KSK2pAigPpaZ55VtrgNuA44FTi03PRrYCjgwM1cCV0TELGBBRJxdLpMkSZIkSdIk0Mo5oLaguOyu3orye1f5fS9gFnBJrUFmrgaWAPvVbbcfcHlD0HQxRSi1dwtrliRJkiRJUsVaGUB9AXhxRLwlImZFxC7AR4GrMvPGss2uwAbgloZtbyrXUdduWX2DzLwDWNPQTpIkSZIkSRNcyy7By8zvR8ThwOeBi8rF1wKvrWvWC6waZiLxfmB6REzNzPVluxU8Vn+5rmmzZ88YS3O1QF/fzHaXIFXOfq5OYD9XJ7CfqxPYz6WJq5OOz5YFUBHxUuB84NPAUuCJwALgWxHxinbdve6++1YxODjUjqfuSH19M1m+/IF2lyFVyn6uTmA/Vyewn6sT2M/VCSZziDMZj8/u7q5NGuzTyknIzwG+m5kn1hZExK8oLqU7ALiUYgTTjIiY0hBI9QJrytFPlO22HuY5est1ktQ2gwPrJ+UPuYH16+i/f/3oDSVJkiSpxVoZQO0KfLV+QWZmRKwFnlouWgZMAeYB2bBt/ZxPy2iY6yki5gDTG9pJ0rjr7pnKrWcc1O4yxmzuKd8EDKAkSZIkjb9WTkJ+O/Cc+gUR8XSKO9fdVi66FlgJHFLXZjown+KyvZqlwKsjon6IwaHAWuDqFtYsSZIkSZKkirVyBNT5wLkRcTePzAH1IYrw6TKAzHwwIs4CTouIforRTMdTBGGLG/Z1HHBpRCwC5lLMJ/XJzFzZwpolSZIkSZJUsVYGUP9EcW3HMcDRFHexuwY4OTNX17U7iyJwOhmYDdwAvDIz7601yMz+iHg5cB6wpNzXuRQhlCRJkiRJkiaRlgVQmTkE/Ev5NVq7M8qvjbW7EXhZq+qTJEmSJElSe7RyDihJkiRJkiTpMQygJEmSJEmSVCkDKEmSJEmSJFXKAEqSJEmSJEmVMoCSJEmSJElSpQygJEmSJEmSVCkDKEmSJEmSJFXKAEqSJEmSJEmVMoCSJEmSJElSpQygJEmSJEmSVCkDKEmSJEmSJFXKAEqSJEmSJEmVMoCSJEmSJElSpQygJEmSJEmSVCkDKEmSJEmSJFXKAEqSJEmSJEmVMoCSJEmSJElSpQygJEmSJEmSVCkDKEmSJEmSJFXKAEqSJEmSJEmVMoCSJEmSJElSpQygJEmSJEmSVCkDKEmSJEmSJFWqp5U7i4ge4H3AkcBTgOXA1zPzvXVtuoCTgWOAbYHrgeMy81cN+9oNWAzsCawALgAWZuaGVtYsSZIkSZKkarV6BNSFwHHAJ4BXAScBaxvanAScBiwC5gOrgCsjYrtag4joBa4EhoADgNOBE4CFLa5XkiRJkiRJFWvZCKiI2Bc4FHh2Zt44QpstKQKoj2XmeeWy64DbgGOBU8umRwNbAQdm5krgioiYBSyIiLPLZZIkSZIkSZoEWjkC6gjgqpHCp9JewCzgktqCzFwNLAH2q2u3H3B5Q9B0MUUotXfLKpYkSZIkSVLlWhlAPR+4OSLOi4iVEbEmIi6NiO3r2uwKbABuadj2pnJdfbtl9Q0y8w5gTUM7SZIkSZIkTXCtnIR8O+Bw4H+ANwAzgbOBb0XECzJzCOgFVg0zkXg/MD0ipmbm+rLdimGeo79c17TZs2eM6UXoL9fXN7PdJUgagcenxsL+ok5gP1cnsJ9LE1cnHZ+tDKC6yq8DMvM+gIi4B7gaeBnwoxY+V9Puu28Vg4ND7XjqjtTXN5Plyx9odxlSpSbzDwmPTzXL87k6gf1cncB+rk7g7+fjq7u7a5MG+7TyErx+4H9r4VPpGmA9sFtdmxkRMaVh215gTTn6qdZu62Geo7dcJ0mSJEmSpEmilQHUTRQjoBp1AYPlv5cBU4B5DW0a53xaRsNcTxExB5je0E6SJEmSJEkTXCsDqO8Bz4yIbeuWvQTYgmJeKIBrgZXAIbUGETEdmA8srdtuKfDqiKgfR3cosJbikj5JkiRJkiRNEq2cA+qzwHHAkog4k2IS8kXAlZl5DUBmPhgRZwGnRUQ/xWim4ymCsMV1+zq/3NelEbEImAssAD6ZmStbWLMkSZIkSZIq1rIRUGUw9DKKOZouBj5DMfH46xuangWcAZxMMWpqFvDKzLy3bl/9wMspLtdbAiwEzgU+3Kp6JUmSJEmSND5aOQKKzPwtsP8obYYoAqgzRml3I0WgJUmSJEmSpEmslXNASZIkSZIkSY9hACVJkiRJkqRKGUBJkiRJkiSpUgZQkiRJkiRJqpQBlCRJkiRJkiplACVJkiRJkqRKGUBJkiRJkiSpUgZQkiRJkiRJqpQBlCRJkiRJkiplACVJkiRJkqRKGUBJkiRJkiSpUgZQkiRJkiRJqpQBlCRJkiRJkiplACVJkiRJkqRKGUBJkiRJkiSpUgZQkiRJkiRJqpQBlCRJkiRJkiplACVJkiRJkqRKGUBJkiRJkiSpUgZQkiRJkiRJqpQBlCRJkiRJkiplACVJkiRJkqRKGUBJkiRJkiSpUj1V7Tgingwk8DhgZmauKpd3AScDxwDbAtdGOm9tAAAW6ElEQVQDx2Xmrxq23w1YDOwJrAAuABZm5oaqapYkSZIkSVLrVTkC6uPAqmGWnwScBiwC5pdtroyI7WoNIqIXuBIYAg4ATgdOABZWWK8kSZIkSZIqUEkAFREvAfYFPtGwfEuKAOpjmXleZl4JHEIRNB1b1/RoYCvgwMy8IjPPpwifjo+IWVXULEmSJEmSpGq0PICKiCkUl86dDvypYfVewCzgktqCzFwNLAH2q2u3H3B5Zq6sW3YxRSi1d6trliRJkiRJUnWqGAF1NDAN+Mww63YFNgC3NCy/qVxX325ZfYPMvANY09BOkiRJkiRJE1xLA6iImA18BDg+Mx8apkkvsGqYicT7gekRMbWu3Yphtu8v10mSJEmSJGmSaPVd8M4AfpGZl7V4v5ts9uwZ7S6h4/T1zWx3CZJG4PGpsbC/qBPYz9UJ7OfSxNVJx2fLAqiI2B04AnhJRGxTLp5eft86IjZQjGCaERFTGkZB9QJrMnN9+bgf2HqYp+kt1zXtvvtWMTg4NJZN9Bfo65vJ8uUPtLsMqVKT+YeEx6ea5flcncB+rk5gP1cn8Pfz8dXd3bVJg31aOQLqacAWwHXDrLsL+DzwFWAKMA/IuvWNcz4to2Gup4iYQxFoPWpuKEmSJEmSJE1srZwD6hrgpQ1fi8p1+wMfB64FVgKH1DaKiOnAfGBp3b6WAq+OiPoY81BgLXB1C2uWJEmSJElSxVo2Aioz/wT8pH5ZROxU/vNnmbmqXHYWcFpE9FOMZjqeIghbXLfp+cBxwKURsQiYCywAPpmZK1tVsyRJkiRJkqrX6knIm3EWReB0MjAbuAF4ZWbeW2uQmf0R8XLgPGAJxR3xzqUIoSRJkiRJkjSJVBpAZeaFwIUNy4Yo7pZ3xijb3gi8rKraJEmSJEmSND5aOQeUJEmSJEmS9BgGUJIkSZIkSaqUAZQkSZIkSZIqZQAlSZIkSZKkShlASZIkSZIkqVIGUJIkSZIkSaqUAZQkSZIkSZIqZQAlSZIkSZKkShlASZIkSZIkqVIGUJIkSZIkSaqUAZQkSZIkSZIqZQAlSZIkSZKkShlASZIkSZIkqVIGUJIkSZIkSaqUAZQkSZIkSZIqZQAlSZIkSZKkShlASZIkSZIkqVIGUJIkSZIkSaqUAZQkSZIkSZIqZQAlSZIkSZKkShlASZIkSZIkqVIGUJIkSZIkSapUT7sLkCRJE8/gwHr6+ma2u4wxG1i/jv7717e7DEmSJDUwgJIkSY/R3TOVW884qN1ljNncU74JGEBJkiRNNC0LoCLiEOAw4LnA1kACn8jMrza0Owr4ADAH+A3wgcz8UUObJwPnAa8A1gEXl+3WtKpeVcNPzCVJkiRJUqNWjoA6Hvg98F7gT8D+wFciYtvMXAwQEW8EzgcWANcA/wB8LyL2yMxfl222AC6n+PjyDcA2wCfL729uYb2qgJ+YS5IkSZKkRq0MoOZn5p/qHl8VEdtTBFOLy2ULgIsy8yMAEXE18NfASTwSLh0MPB2Yl5m/L9s9BFwcEQsz85YW1ixJkiRJkqSKtewueA3hU81/A9sDRMRcYBfgkrptBoGvA/vVbbMfcH0tfCp9m2J4yr6tqleSJEmSJEnjo2UB1Aj2BG4u/71r+X1ZQ5ubgMdHRF9du0e1ycz1wO/q9iFJkiRJkqRJorK74EXEy4HXAUeUi3rL7ysamvbXrV9efm9sU2vXO8zyjZo9e8ZYN1GHmoyTp0tjZT9XJ7CfayzsL+oE9nNp4uqk47OSACoidgK+AnwnMy+s4jmadd99qxgcHGpnCR1lMh88y5c/0O4SNEnYz9UJ7OfqBH19M+0v2uzZz9UJ/L1lfHV3d23SYJ+WX4IXEY8HlgK3A39ft6o20mnrhk16G9b3D9Om1q5/mOWSJEmSJEmawFoaQEXEdOB7wFTgNZm5pm51bV6nxnmcdgX+nJnL69o9qk1ETAXm8tj5oyRJkiRJkjTBtSyAiogeijvaPQ3YNzP/WL8+M2+lmJD8kLptusvHS+uaLgX2iIgd65a9FpgG/KBV9UqSJEmSJGl8tHIOqH8G9gfeA8yOiNl16/47M9cBC4B/j4jbgJ8Db6UIrN5U1/YbwCnApRFxGsXleOcCX8nMW1pYryRJkiRJksZBKy/Be1X5/dPAdQ1fTwLIzK8CRwOHU4xmehbFpXq/ru0kMx8C9gXuBC4BzgO+Cby9hbVKkiRJkiRpnLRsBFRm7tRku88BnxulzV3A61pQliRJkiRJktqs5XfBkyRJkiRJkuoZQEmSJEmSJKlSBlCSJEmSJEmqVCvvgidJkiRNGoMD6+nrm9nuMsZsYP06+u9f3+4yJEkaEwMoSZIkdaTunqncesZB7S5jzOae8k3AAEqSNLl4CZ4kSZIkSZIqZQAlSZIkSZKkShlASZIkSZIkqVIGUJIkSZIkSaqUAZQkSZIkSZIqZQAlSZIkSZKkShlASZIkSZIkqVIGUJIkSZIkSaqUAZQkSZIkSZIqZQAlSZIkSZKkShlASZIkSZIkqVIGUJIkSZIkSapUT7sLkCRJkiRJ7TVz1lZsOc2IQNWxd0mSJEmS1OG2nNbD/BO+0+4yNsmScw5odwlqgpfgSZIkSZIkqVIGUJIkSZIkSaqUAZQkSZIkSZIqZQAlSZIkSZKkSk3YScgjYjdgMbAnsAK4AFiYmRvaWpgkSZIkTRKDA+vp65vZ7jLGbGD9OvrvX9/uMiS10IQMoCKiF7gSuBE4AHgqcA7FiK1T21jauPD2l5IkSZJaobtnKreecVC7yxizuad8EzCAkjYnEzXlOBrYCjgwM1cCV0TELGBBRJxdLttseftLSdo8+IGCJEmSVJiovxXvB1zeEDRdDCwC9gaWtKUqSZLGwA8UJEmSpMJEDaB2Ba6qX5CZd0TEmnJdMwHUFIDu7q7WVzcOntC7VbtL2GQ9W/e1u4RNMln7ymQ1Y8aWTJvEI0Ps52qW5/PxZz8fX57P28N+Pr7s5+1hPx9//t4y/iZjP6+recpYtusaGhpqfTV/oYh4CHh/Zn6qYfldwBcz84NN7OZFwM+qqE+SJEmSJKnDvRi4ptnGkzfGH931FG/GPYB3zpMkSZIkSfrLTQGeRJG7NG2iBlD9wNbDLO8t1zVjHWNI4iRJkiRJktSU3411g+4qqmiBZRRzPT0sIuYA08t1kiRJkiRJmiQmagC1FHh1RMysW3YosBa4uj0lSZIkSZIkaVNM1EnIe4EbgV8Di4C5wCeBT2Xmqe2sTZIkSZIkSWMzIUdAZWY/8HKKia2WAAuBc4EPt7MuSZIkSZIkjd2EHAElSZIkSZKkzceEHAElSZIkSZKkzYcBlCRJkiRJkirV0+4CJEmSJEnS5isiuoC3AscAuwODwH8D52Tmd9tZW6OI2An4PTA/M79XLrsN+EZmvq99lU1+BlDjKCIuBJ6Rmc8bp+c7FlicmV1j3G4G8ADwD5l54Ri2ez0wfSzbtNtINUfE84BjgT2BpwFfzMzDx71APSwiFjDyjQgOA+ZtZH3N1Zm5T0T8BPhTZh7cugr/cvbHycPz+cSzkePnHcDBwLOALSnusLswM3847kUK6Pjzuf1RLT/HRsQ+wI+BZ2bmr8ew3eHAvwEzM3NVK2qRNuKfgaPK76dSZBFvAL4TESdl5qJ2FqfxYQClVno9sC1wYZvrGIuRan4h8CLgF8DMca5JI7sf2HeY5b+l+EX+B3XL3g28DPi7umUrqyutJeyPmig2p/P5KRTnhs8Aq4E3Az+IiNdNtE9cO0ynns/tj4LWn2N/SfEh1e/GuN33y+3WtKgOaVgR8TrgaOCYzDy/btXSiPgDcGZEXJGZv6ywhinAlMxcX9VzaHQGUOpIEbFVZq7dSJPFmfnpsu0N41SWRjeQmb/YyPq7av+IiIOBdaO0nxDsj9Kma+L4eU5m/qnu8RUR8TTgvYB/8LdPp57P7Y9qSkRsAQxm5obR2mbmSooPqcYkM5cDyzehPGms3kPxAcPnhll3JvAO4NiIuIMiqNo+MwdrDSLib4HvAU/LzN+Wy95Gce6cB/wB+Exmnl23zYXAM4CPAmcAuwAvi4jflo/3AZ4E3AlcApxuOFU9A6g2i4i/As6h+PRhHXAZcHxm3lvX5izgb4GdgRXA1cAJmfmHujbTyv28meJ62ouAO5qs4SDgY8Ac4Hrg+GHavAV4O7Ab0AX8Cnh/Zt5Qrr8QOKj891C52cLMXFCeMP4ReDbFp5o3Ah8abbh5bVg98EPgg8ATgauAt2fm/43x/bkN+Ga5/h3AEyPiyyPVXH/Ck8D+qNF5Pt9oXT+hfcdP/R/7Nf8NvGJjNWvzZX9UO410jqX4Y7jWL08EdgJ2iojHAQsoRkPPppiX5nPAP9V+PxjuErxy3/9I0b+PAoaAr1P8XFpXtjmcukvw6ua9ORR4OcXlUQ8An6fow/WBwCEUwcEOFOHX8RQjscZ0ybc2fxHRQ/G70T8PF6hm5v0R8WPgJcB8ikuw96bo0zWHAv9VFz69n6L/nQ38BHgu8JGIWJOZ59Vtt1PZ5nSKkOr3FKMP/0zRZ/spgqkFQB/FeVwVMoBqo4joozhgbgLeBMwAzqL4NOx5dQnsEygOsLspDowTgKsi4hl1PwjOAt5GMbT7RoofNIc0UcNzgK8B36JIpp9BkQA32gn4IsXQ3qnAG4GfRcTumXkr8BHgKcA2wDvLbWqfXu4MLAE+QfHH1H4Uwy1fkpk/H6XEPYGgOEFsCSwCvg3sUdemmfcHivf4N2V9PcD/bKRmTVDlD7FHycyBcXp6+6OG5fl80p3P9wRuHqVeVczz+aNqsT92jpHOsftQhExPpQig1lBcqroLkMCXKcKgv6IIrLai+MBhY06gCFffTDHv2MeA2yn+IN+YsymC1YMpgqgPUfTxS+DhuSkvBr5BcYns0yl+/kjD2RaYRtH3RnI7sG9m3hQR/48icPoxPPzB3AEUxw4RMYsipPpoZi4st78iIqYDp0bEv9QFXbOBV2Tmr+qe6y7g4YnEI+LnFJdEfyEi3u0oqGoZQLXXCeX3V5dDZ4mIWyg+RTgI+CpAZh5R26C8dvU6igPnRcBPI2I2xVDFD2fmOWW7yyn+cBnNSRS/9Lw+M4co/pCYSjFU8WGZeXpdDd3AFcDfUPxAOz0zfxcRfwa6G4fI16fQ5bY/prjzwZHAaH+wPAHYMzPvKLe/HbgmIvbNzB808/407O81mflgXftha9aENRt4qHFhROycmbeNw/PbHzUSz+eT5HweEUcAf80j/2dqD8/n2B870Ujn2IiAIpT6q/qRs8CPyq/aXcSuAaZTfDgxWgB1Wz5y05LLI+KFwIGMHkD9NDNrffKKiNi33K72ocaJFB+4vKH8efOD8pJBJ5FWK3wNeG9EHFt+KLEfxRyotf63J/A44OsNH2RcBZxGMSqvFnb9X0P4VDuO3kMxGnxnig8hap5CcamgKtLd7gI63N8AP6z9sQKQmf8B3Ebxyw0AEbFfRFwbEfcDAzzyKdou5fdnUhw436nbz2D941Fq+G75w6Pm0sZGEfH0iPhWRNwLbKD4pTHqahhRROwQERdFxP+V9T8EvKqZbYFf1n45BCg/Yf9jWXdt/6O9PzU/qv/lUJPS/RSfTjd+3T1Oz29/1Eg8n4+u7cdPRDwXWAx8OjN/PFp7Varjz+f2Rw3jvxrCJyJiy4hYGMW8NesozrtnADsPN4qwQePl0TdS/HE+mtG22wNY0vDzxjnMNJI/UfTdHTfSZkegdgn01yhGTb2sfHwocF3dOXvb8vtvKI6H2lftPDqnbr+POp5K/0gxkvtbFCOr/gZ4V7luy2Haq4UcAdVeT6I4cBrdCzweICL2oDihf4visow/UlzD/QseOUC2K7//sWE/jY+Hs91o20XETIofRPdSDFW/HXgQuIBRDtLyE/LvUqTWH6JIlFdTXIf7hCbqG+41/JHivWv2/akZ7gSkyWWgNk9Nm9gfNRLP56Nr6/ETEXMp7vj0IxxtMhF09Pnc/qgRDNeXFlFclr2QYo6lFRR/NJ9K0RdXbWR/Kxoer6e5P7BH2247Hjt5uZOZa1iZORAR1wF/GxHva5zbtLykbh+K821tlOANwKERcQ3FvFAfrNvkz+X31zD8MZN1/x4aZv0hwDcy85S6GnYb26vSpjKAaq97GP6X9icC/1X+++8oTuiH1j5liIjG9Lg2GeYTeOSArD0ezR+Gadf4eE+KTz1emZnLagsjYusm9j+PYmj5frUh7eW2WzWx7XC11JbdU/67mfenZrgTkDQW9keNxPP56Np2/ETEE4DLKQK3Nww3Cao6jv1RE9FIfywvzkff3etvx6+kYf2BYl60eo2PpXqfpgiY3gZ8tmHdScAsoH7y8Isp5sK8imK+s6/XrbsOWEtxp7zvb0ItW1GMyKr395uwH20CL8Frr/8AXl1+Ig08/InbThTXd0NxgDzUMMS18QD5X4pPsA+o2093/eONuB54bXktbM2BDW1qf1w8fKBGxF5lnfWG+1RluG13pJhksRnPiYin1G37QopfEP+zbv+jvT8b0+wnQRLYHzUyz+eja8vxExEzKO5ICMU8PWvGsE9tvuyPaqex/Lx/1B/L5Xxkb6iiqDG4Hpjf8PPmte0qRhNfZn4bOB/4TER8KiJeERH7RsS/AScDp2TmL+s2uYRiTrSPU8xJdk/dvlZQ3LXu0xHx0Yh4Vbmv4yLiW02UcwXF6Kp3RsSrI+KLFB+yaRw4Aqq9PgkcQzEp4CIeuWvS/1LceQKKA+QfI+JTFHce2otiotiHZeZ9EfFZYGFEDFBcBnJUub/RLKL4w+mSiPg8xV2Tjmxo8wuK4b2fi4izKT49X8Aj1+nWLAMOiIjXUcyTcHe57C7gnIg4jeLSjYXDbDuS5cD3I+LDPHKXml/Wffo+6vszisfUnJl3R3FHq73LNr3AjhFxMEBmfmMM+1dr9UTEC4ZZfmfW3Tq7SU+u/Z/WG+X/1/6okXg+H11bjh+KebCeBRwOPDUinlrbwAn/26ojz+fYH1UY7hw7kiuAd5VzQP2ZYq6aadWXuFG1nzcXlwHC0yl+VkFxh1RpOO+k6DfHUPSXQYrLSg/IzEfNIZaZd0bEtRQfci1s3FFmnh0RdwPvpbiM+cH/3979ukgVRQEc/65gNuw/IFtOsJkMBg02kyAoGDQYFSxrEtRgW7CoTRTXKhq0aNhkkPkHDgYNFllFEJPBNZw3ML/nKb6Zwf1+YMrc94bLvPvuzD3v3nuoRCxtsjHepmbs9ZO0PAOuUn29OmYAaokyczciTgJbVIakn9RTsWvZpH/MzFcRcZ1KcXqZmnJ4mvF0vZvAQWpfjl/ANjUg2ppTh15EnKOyaDwHetRGb+8GjvkcEWepzdpeAO+pLE2bIx93n1qe8ZAaJN/KzJsRcQa4R6Vq/URtnHiCGhzN8xZ4A9ylOoodKmNBv25tv59pxupMDcaOMDzVc6OpM8Dg0x4t1iHqGo+6wUimrxaOMXyN+2ZdX9ujJrI/X+n+/FRT/nTCOd4/y7Nf+3Pbo2By+5jmCs3MEWrZ0WNqKdPoMqaFaX5vzgN3qBm6PSqo8Br4Putc7V/NjNJHzavN8cfnlG9T/5GmlV+c8v4P4NKEorWBYz4y0idn5uFZ9VE7a3t7bkOi1RQRO8CXzBx7qiktmu1R+nveP1oltkfp34uIC8ATYCMzPyy7PpJWkzOgJEmSJEmtRcQDasbTN+AolZXvpcEnSbMYgJIkSZIk/Yl1ainhOvCV2ntndDm3JA1xCZ4kSZIkSZI6dWDZFZAkSZIkSdL/zQCUJEmSJEmSOmUASpIkSZIkSZ0yACVJkiRJkqROGYCSJEmSJElSp34DPxJAynn73/QAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 1440x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"gpu_time = [GPU_RUN_TIME[i] for i in steps]\n",
"cpu_time = [CPU_RUN_TIME[i] for i in steps]\n",
"df = pd.DataFrame({'GPU': gpu_time,'CPU': cpu_time}, index=steps)\n",
"df.plot.bar(rot=0,figsize=(20,5), fontsize=15, title='Running time: seconds')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**The rapids solution achieves up to 140x speedup for ETL and 25x end-to-end speedup over the CPU solution with comparable accuracy.**"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
@holyglenn
Copy link

holyglenn commented Aug 5, 2019

Hi Thanks for sharing the solution! Congratulations on achieving top 1%! I am very interested in Rapids. Could you include more details on the hardware spec on the 140X speedup number? I can infer that the GPU is one Tesla V100 with potentially 32GB pf GDRAM, what's the CPU? daxiongshu

@holyglenn
Copy link

holyglenn commented Aug 12, 2019

@daxiongshu Could you help answer the question? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment