Skip to content

Instantly share code, notes, and snippets.

@oguiza
Created November 21, 2018 10:45
Show Gist options
  • Save oguiza/3d8266e04767b119f7c13ccadd071862 to your computer and use it in GitHub Desktop.
Save oguiza/3d8266e04767b119f7c13ccadd071862 to your computer and use it in GitHub Desktop.
course-v3/my_nbs/dl1/Untitled.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# TSSG Fast Learning Competition: Earthquakes*"
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-11-21T08:25:16.022459Z",
"end_time": "2018-11-21T08:25:16.026724Z"
}
},
"cell_type": "markdown",
"source": "*A. Bagnall, J. Lines, W. Vickers and E. Keogh, The UEA & UCR Time Series Classification Repository,\nwww.timeseriesclassification.com"
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-11-21T10:25:37.578229Z",
"end_time": "2018-11-21T10:25:37.792549Z"
},
"trusted": true
},
"cell_type": "code",
"source": "%reload_ext autoreload\n%autoreload 2\n%matplotlib inline",
"execution_count": 1,
"outputs": []
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-11-21T10:25:38.505040Z",
"end_time": "2018-11-21T10:25:39.606206Z"
},
"trusted": true
},
"cell_type": "code",
"source": "from fastai import *\nimport fastai\nfastai.__version__",
"execution_count": 2,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 2,
"data": {
"text/plain": "'1.0.28'"
},
"metadata": {}
}
]
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-11-21T10:25:41.681343Z",
"end_time": "2018-11-21T10:25:41.735200Z"
},
"trusted": true
},
"cell_type": "code",
"source": "import zipfile\nimport tempfile\nimport shutil\nimport os\nimport sys\nimport csv\ntry:\n from urllib import urlretrieve\nexcept ImportError:\n from urllib.request import urlretrieve\ntry:\n from zipfile import BadZipFile as BadZipFile\nexcept ImportError:\n from zipfile import BadZipfile as BadZipFile\n\nfrom tslearn.utils import to_time_series_dataset",
"execution_count": 3,
"outputs": []
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-11-21T10:25:42.833509Z",
"end_time": "2018-11-21T10:25:42.862437Z"
},
"trusted": true
},
"cell_type": "code",
"source": "def extract_from_zip_url(url, target_dir=None, verbose=False):\n \"\"\"Download a zip file from its URL and unzip it.\"\"\"\n fname = os.path.basename(url)\n tmpdir = tempfile.mkdtemp()\n local_zip_fname = os.path.join(tmpdir, fname)\n urlretrieve(url, local_zip_fname)\n try:\n if not os.path.exists(target_dir):\n os.makedirs(target_dir)\n zipfile.ZipFile(local_zip_fname, \"r\").extractall(path=target_dir)\n shutil.rmtree(tmpdir)\n if verbose:\n print(\"Successfully extracted file %s to path %s\" % (local_zip_fname, target_dir))\n return target_dir\n except BadZipFile:\n shutil.rmtree(tmpdir)\n if verbose:\n sys.stderr.write(\"Corrupted zip file encountered, aborting.\\n\")\n return None",
"execution_count": 4,
"outputs": []
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-11-21T10:25:44.161148Z",
"end_time": "2018-11-21T10:25:44.191811Z"
},
"trusted": true
},
"cell_type": "code",
"source": "def prepare_dataset(dataset_name, target_dir):\n '''\n Download selected UCR dataset, unzip files to target dir, \n read train & test files, and normalizes data\n '''\n\n full_path = os.path.join(target_dir, dataset_name)\n fname_train = dataset_name + \"_TRAIN.txt\"\n fname_test = dataset_name + \"_TEST.txt\"\n if not os.path.exists(os.path.join(full_path, fname_train)) or \\\n not os.path.exists(os.path.join(full_path, fname_test)):\n url = \"http://www.timeseriesclassification.com/Downloads/%s.zip\" % dataset_name\n for fname in [fname_train, fname_test]:\n if os.path.exists(os.path.join(full_path, fname)):\n os.remove(os.path.join(full_path, fname))\n extract_from_zip_url(url, target_dir=full_path, verbose=False)\n try:\n data_train = np.loadtxt(os.path.join(full_path, fname_train), delimiter=None)\n data_test = np.loadtxt(os.path.join(full_path, fname_test), delimiter=None)\n except:\n return None, None, None, None\n\n X_train = to_time_series_dataset(data_train[:, 1:])\n y_train = data_train[:, 0].astype(np.int)\n X_test = to_time_series_dataset(data_test[:, 1:])\n y_test = data_test[:, 0].astype(np.int)\n\n X_train = np.squeeze(X_train)\n # scale the values\n X_train_mean = np.mean(X_train)\n X_train_std = np.std(X_train)\n X_train = (X_train - X_train_mean) / X_train_std\n\n nb_classes = len(np.unique(y_train))\n y_train = ((y_train - y_train.min()) / (y_train.max() - y_train.min()) * (nb_classes - 1)).astype(int)\n y_train = y_train.ravel()\n\n X_test = np.squeeze(X_test)\n # scale the values\n X_test = (X_test - X_train_mean) / X_train_std\n\n y_test = ((y_test - y_test.min()) / (y_test.max() - y_test.min()) * (nb_classes - 1)).astype(int)\n y_test = y_test.ravel()\n\n return X_train, y_train, X_test, y_test",
"execution_count": 5,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## Prepare data"
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-11-21T10:25:45.849084Z",
"end_time": "2018-11-21T10:25:45.869899Z"
},
"trusted": true
},
"cell_type": "code",
"source": "SEL_DATASET = 'Earthquakes'\nTGT_DIR = Path('my_data/UCR_univariate/' + SEL_DATASET)",
"execution_count": 6,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "prepare_dataset() will download url to selected dataset, unzip file, load them to sel target dir, prepare train & test sets and normalize data."
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-11-21T10:25:47.304723Z",
"end_time": "2018-11-21T10:25:47.470052Z"
},
"trusted": true
},
"cell_type": "code",
"source": "X_train, y_train, X_test, y_test = prepare_dataset(SEL_DATASET, TGT_DIR)",
"execution_count": 7,
"outputs": []
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-11-21T10:25:48.680753Z",
"end_time": "2018-11-21T10:25:48.703020Z"
},
"trusted": true
},
"cell_type": "code",
"source": "print(\"Number of train samples: \", X_train.shape[0], \"Number of test samples: \", X_test.shape[0])\nprint(\"Sequence length: \", X_train.shape[-1])",
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"text": "Number of train samples: 322 Number of test samples: 139\nSequence length: 512\n",
"name": "stdout"
}
]
},
{
"metadata": {
"ExecuteTime": {
"start_time": "2018-11-21T10:39:42.879214Z",
"end_time": "2018-11-21T10:39:43.139137Z"
},
"trusted": true
},
"cell_type": "code",
"source": "f, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(15,5))\nax1.plot(X_train[1])\nax1.set_title('class ' + str(y_train[1]))\nax2.plot(X_train[0])\nax2.set_title('class ' + str(y_train[0]))\nplt.show()",
"execution_count": 38,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 1080x360 with 2 Axes>",
"image/png": "\n"
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "You are now ready to start creating working on this dataset!!\n\nAccording to the UCR website, the current state of the art accuracy on this dataset is 0.835526316, achieved with a non deep learning model ('HIVE-COTE'). Will you be able to improve that?"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"name": "fastai-v1",
"display_name": "fastai-v1",
"language": "python"
},
"toc": {
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"base_numbering": 1,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"language_info": {
"name": "python",
"version": "3.7.0",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"gist": {
"id": "",
"data": {
"description": "course-v3/my_nbs/dl1/Untitled.ipynb",
"public": true
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment