Skip to content

Instantly share code, notes, and snippets.

@ia35
Created January 13, 2020 13:29
Show Gist options
  • Save ia35/6155832b71ee6b3ed8d60eddac9cf671 to your computer and use it in GitHub Desktop.
Save ia35/6155832b71ee6b3ed8d60eddac9cf671 to your computer and use it in GitHub Desktop.
AADL_Dataset_abstraction.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "AADL_Dataset_abstraction.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyPxfaaKx/ODxFV1NMERUkMv",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/ia35/6155832b71ee6b3ed8d60eddac9cf671/aadl_dataset_abstraction.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qwBsL6H0n8MG",
"colab_type": "text"
},
"source": [
"[![](http://bec552ebfe.url-de-test.ws/ml/buttonBackProp.png)](https://www.backprop.fr)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ORhnqfYwn_wg",
"colab_type": "text"
},
"source": [
"[![](https://raw.githubusercontent.com/BackProp-fr/meetup/master/images/LogoBackPropTranspSmall.png)](https://www.backprop.fr)\n",
"Le logo BackProp est présenté chaque fois qu'une modification importante est apportée au code ou à chaque fois qu'un commentaire doit être signalé. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AkD0Yvf1oDDL",
"colab_type": "text"
},
"source": [
"## <font color=\"teal\">Objectif</font>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VcZQlivuoK0-",
"colab_type": "text"
},
"source": [
"Il s'agit de tester le code du livre de Michelucci, **Advanced Applied Deep Learning de U. Michelucci**, § Dataset Abstraction\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "TnkVTu9mnwn8",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 54
},
"outputId": "611e005f-e83b-4f8b-8fa7-667a4ae4be13"
},
"source": [
"%tensorflow_version 2.x \n",
"import tensorflow as tf\n",
"print(tf.__version__)"
],
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"text": [
"2.1.0-rc1\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Jn3yqZE6oYA9",
"colab_type": "text"
},
"source": [
"## <font color=\"teal\">Inspiration</font>\n",
"\n",
"- § MNIST Classification with Eager Execution - Livre **Advanced Applied Deep Learning** de: Umberto Michelucci\n",
"\n",
"- Source [Code](https://github.com/Apress/advanced-applied-deep-learning) for 'Advanced Applied Deep Learning' by Umberto Michelucci (^1)\n",
"\n",
"- Source [Code](https://github.com/Apress/applied-deep-learning) for 'Applied Deep Learning' by Umberto Michelucci (^2)\n",
"- Au sujet de [Keras](https://colab.research.google.com/drive/1TU-zOLalO2t8I4h-1rRcLIdSQGfIcO9P) (^3)\n",
"\n",
"- tf.data: [Build TensorFlow](https://www.tensorflow.org/guide/data) input pipelines (^4)\n",
"\n",
"- [What is](https://medium.com/@Joocheol_Kim/what-is-tf-data-dataset-4adc3b7f952a) tf.data.Dataset? Part 1. (^5)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EZ8bU_988yI-",
"colab_type": "text"
},
"source": [
"Basically, a Dataset it is simply a sequence of elements, in which each element contains one or more tensors. Typically, each element will be one training example or a batch of them. The basic idea is that first you create a Dataset with some data, and then you chain method calls on it. For example, you apply the Dataset.map() to apply a function to each element. Note that a dataset is made up of elements, each with the same structure. »\n",
"\n",
"Extrait de: Umberto Michelucci. « Advanced Applied Deep Learning. »"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FJpP2HtLESpg",
"colab_type": "text"
},
"source": [
"## <font color=\"teal\">Dataset</font>\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GKXLxVFa87T5",
"colab_type": "text"
},
"source": [
"There are two distinct ways to create a dataset: (^4)\n",
"\n",
"- A data source constructs a Dataset from data stored in memory or in one or more files.\n",
"- A data transformation constructs a dataset from one or more tf.data.Dataset objects."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pM6aJFnfAzb4",
"colab_type": "text"
},
"source": [
"[![](https://raw.githubusercontent.com/BackProp-fr/meetup/master/images/LogoBackPropTranspSmall.png)](https://www.backprop.fr)\n",
"\n",
"Le code du livre ne marche pas tel que avec Tf 2.0"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pkW7ulAbECYl",
"colab_type": "text"
},
"source": [
"### <font color=\"orange\">tf.random.uniform</font>\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "chH7Fmk-BfTq",
"colab_type": "text"
},
"source": [
"Ce n'est pas **tf.random_uniform** qu'il faut utiliser mais tf.random.uniform"
]
},
{
"cell_type": "code",
"metadata": {
"id": "lQCgsT0qCFaY",
"colab_type": "code",
"colab": {}
},
"source": [
"inp = tf.data.Dataset.from_tensor_slices(tf.random.uniform([4, 3]))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "YsUnCUOE-bm7",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 54
},
"outputId": "602fd3a4-cc65-4322-efee-28eb6f2fa67f"
},
"source": [
"inp"
],
"execution_count": 22,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<TensorSliceDataset shapes: (3,), types: tf.float32>"
]
},
"metadata": {
"tags": []
},
"execution_count": 22
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4UxmgMhZEMVv",
"colab_type": "text"
},
"source": [
"### <font color=\"orange\">from_tensor_slices</font>\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "ztSXGfbZn9Xo",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 54
},
"outputId": "37451078-d75b-423f-a4ff-b06037eeba66"
},
"source": [
"dataset = tf.data.Dataset.from_tensor_slices([8, 3, 0, 8, 2, 1])\n",
"dataset"
],
"execution_count": 23,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<TensorSliceDataset shapes: (), types: tf.int32>"
]
},
"metadata": {
"tags": []
},
"execution_count": 23
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "ioSOg9Jj9Uge",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 137
},
"outputId": "6dbd4452-9df1-4636-c633-fac4a6e8fe0a"
},
"source": [
"for elem in dataset:\n",
" print(elem.numpy())"
],
"execution_count": 24,
"outputs": [
{
"output_type": "stream",
"text": [
"8\n",
"3\n",
"0\n",
"8\n",
"2\n",
"1\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "ltL9Gpas9e8q",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 54
},
"outputId": "3acccce9-e8e1-44b5-8af1-3fc3b89bd704"
},
"source": [
"it = iter(dataset)\n",
"print(next(it).numpy())"
],
"execution_count": 25,
"outputs": [
{
"output_type": "stream",
"text": [
"8\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "7MlmQMUaAA-D",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 54
},
"outputId": "da478c91-72c4-4b90-b6b8-920e12e5dcc5"
},
"source": [
"it = iter(inp)\n",
"print(next(it).numpy())"
],
"execution_count": 26,
"outputs": [
{
"output_type": "stream",
"text": [
"[0.46002626 0.80494416 0.03887343]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "lGIWOjWzAd8y",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 87
},
"outputId": "18958053-f970-452f-8693-5892875dc9c3"
},
"source": [
"for elem in it:\n",
" print(elem.numpy())"
],
"execution_count": 27,
"outputs": [
{
"output_type": "stream",
"text": [
"[0.76621425 0.9940531 0.71153843]\n",
"[0.7168385 0.42221284 0.1492492 ]\n",
"[0.22085774 0.15073335 0.6647513 ]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "EWD2jFUSC8UY",
"colab_type": "code",
"colab": {}
},
"source": [
"raui=tf.random.uniform([4, 3])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "iTJKP7hBEMAD",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 54
},
"outputId": "a0ad8114-46da-4c39-d78e-8162a0898545"
},
"source": [
"tf.rank(raui)"
],
"execution_count": 29,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<tf.Tensor: shape=(), dtype=int32, numpy=2>"
]
},
"metadata": {
"tags": []
},
"execution_count": 29
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "s5EfdFL390gR",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 120
},
"outputId": "910119e9-2802-4f4a-8b9b-3ca2949e65d1"
},
"source": [
"raui"
],
"execution_count": 30,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<tf.Tensor: shape=(4, 3), dtype=float32, numpy=\n",
"array([[0.02307069, 0.95744634, 0.93800557],\n",
" [0.69792676, 0.8236463 , 0.03819358],\n",
" [0.83961713, 0.71659195, 0.4626255 ],\n",
" [0.33622348, 0.26155758, 0.162346 ]], dtype=float32)>"
]
},
"metadata": {
"tags": []
},
"execution_count": 30
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "YMY1Airk9oRf",
"colab_type": "code",
"colab": {}
},
"source": [
"dataset = tf.data.Dataset.from_tensor_slices(raui)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "tapDjMNw-JHI",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 104
},
"outputId": "96f340d2-9410-4f6b-bdb9-6af5f70022c3"
},
"source": [
"for elem in dataset:\n",
" print(elem.numpy())"
],
"execution_count": 32,
"outputs": [
{
"output_type": "stream",
"text": [
"[0.02307069 0.95744634 0.93800557]\n",
"[0.69792676 0.8236463 0.03819358]\n",
"[0.83961713 0.71659195 0.4626255 ]\n",
"[0.33622348 0.26155758 0.162346 ]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "G04QXTks_OY9",
"colab_type": "text"
},
"source": [
"To create an input [pipeline](https://www.tensorflow.org/guide/data), you must start with a data source. For example, to construct a Dataset from data in memory, you can use tf.data.Dataset.from_tensors() or tf.data.Dataset.from_tensor_slices(). Alternatively, if your input data is stored in a file in the recommended TFRecord format, you can use tf.data.TFRecordDataset().\n",
"\n",
"Once you have a Dataset object, you can transform it into a new Dataset by chaining method calls on the tf.data.Dataset object. For example, you can apply per-element transformations such as Dataset.map(), and multi-element transformations such as Dataset.batch(). "
]
},
{
"cell_type": "code",
"metadata": {
"id": "kuhnXzUq_mKv",
"colab_type": "code",
"colab": {}
},
"source": [
"import numpy as np"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "9mAcStqM5xRf",
"colab_type": "text"
},
"source": [
"numpy.linspace (start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)\n",
"\n",
"[Return](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html) evenly spaced numbers over a specified interval.\n",
"\n",
"Returns num evenly spaced samples, calculated over the interval [start, stop].\n",
"\n",
"start : array_like\n",
"The starting value of the sequence.\n",
"\n",
"stop : array_like\n",
"The end value of the sequence, unless endpoint is set to False. In that case, the sequence consists of all but the last of num + 1 evenly spaced samples, so that stop is excluded. Note that the step size changes when endpoint is False.\n",
"\n",
"num : int, optional\n",
"Number of samples to generate. Default is 50. Must be non-negative."
]
},
{
"cell_type": "code",
"metadata": {
"id": "Ob8ip66VAQAI",
"colab_type": "code",
"colab": {}
},
"source": [
"matr = np.linspace((1,2),(10,20),10)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "1PR1DG7VARql",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 203
},
"outputId": "a5a48364-857c-46fe-ea62-25a92ff5fd87"
},
"source": [
"matr"
],
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([[ 1., 2.],\n",
" [ 2., 4.],\n",
" [ 3., 6.],\n",
" [ 4., 8.],\n",
" [ 5., 10.],\n",
" [ 6., 12.],\n",
" [ 7., 14.],\n",
" [ 8., 16.],\n",
" [ 9., 18.],\n",
" [10., 20.]])"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "83lgbNAtAZPf",
"colab_type": "code",
"colab": {}
},
"source": [
"dataset2 = tf.data.Dataset.from_tensor_slices(matr)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "bUcRK1ukB_ad",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 203
},
"outputId": "74ee3538-d908-423d-8fbb-50a125957ee6"
},
"source": [
"for elem in dataset2:\n",
" print(elem.numpy())"
],
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"text": [
"[1. 2.]\n",
"[2. 4.]\n",
"[3. 6.]\n",
"[4. 8.]\n",
"[ 5. 10.]\n",
"[ 6. 12.]\n",
"[ 7. 14.]\n",
"[ 8. 16.]\n",
"[ 9. 18.]\n",
"[10. 20.]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "f67GPQGMD25O",
"colab_type": "text"
},
"source": [
"### <font color=\"orange\">Map</font>\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cszFRQ8P7NFV",
"colab_type": "text"
},
"source": [
"On peut appliquer une fonction à tous les éléments du dataset"
]
},
{
"cell_type": "code",
"metadata": {
"id": "S5dM8uW--Qia",
"colab_type": "code",
"colab": {}
},
"source": [
"dataset2 = dataset2.map(lambda x: x*2)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "kLnVouX3Dysx",
"colab_type": "text"
},
"source": [
"### <font color=\"orange\">Iterator</font>\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "cXt5LQ_2_gQZ",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 203
},
"outputId": "41b89568-19d4-4bf9-80a3-b0142d2646af"
},
"source": [
"for elem in dataset2:\n",
" print(elem.numpy())"
],
"execution_count": 10,
"outputs": [
{
"output_type": "stream",
"text": [
"[2. 4.]\n",
"[4. 8.]\n",
"[ 6. 12.]\n",
"[ 8. 16.]\n",
"[10. 20.]\n",
"[12. 24.]\n",
"[14. 28.]\n",
"[16. 32.]\n",
"[18. 36.]\n",
"[20. 40.]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mWXA8vaR-lDE",
"colab_type": "text"
},
"source": [
"An iterator is an object that contains a countable number of values.\n",
"\n",
"An iterator is an object that can be iterated upon, meaning that you can traverse through all the values.\n",
"\n",
"Technically, in Python, an iterator is an object which implements the iterator protocol, which consist of the methods __iter__() and __next__()."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "86bKi4tAA-6S",
"colab_type": "text"
},
"source": [
"On ne fait les iterators comme indiqué dans le livre car on utilise systématiquement le mode eager\n",
"\n",
"In order to check what happened, we could print the first element in each dataset. This can be easily done (more on that later) with:\n",
"\n",
"dataset.make_one_shot_iterator().get_next() \n",
"\n",
"On utilisera la mode Python\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Yqzc2K7F7mAH",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 71
},
"outputId": "7423063b-eb8e-4690-bfbc-c7425060fefd"
},
"source": [
"it = iter(dataset2)\n",
"print(next(it).numpy())\n",
"print(next(it).numpy())"
],
"execution_count": 17,
"outputs": [
{
"output_type": "stream",
"text": [
"[2. 4.]\n",
"[4. 8.]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "Xdao5wJq9w2e",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 54
},
"outputId": "119a2dec-c47e-4653-b945-7cec5687f541"
},
"source": [
"print(next(it))"
],
"execution_count": 19,
"outputs": [
{
"output_type": "stream",
"text": [
"tf.Tensor([ 6. 12.], shape=(2,), dtype=float64)\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lVQqlG2KDqiE",
"colab_type": "text"
},
"source": [
"### <font color=\"orange\">Batching</font>\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "8OD9UJNe-skK",
"colab_type": "code",
"colab": {}
},
"source": [
"batched_dataset = dataset2.batch(2)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "r4Ue9TBsCNqX",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 87
},
"outputId": "be1ad63b-f605-45ba-adcd-e7669c584462"
},
"source": [
"it2 = iter(batched_dataset)\n",
"print(next(it2))"
],
"execution_count": 23,
"outputs": [
{
"output_type": "stream",
"text": [
"tf.Tensor(\n",
"[[2. 4.]\n",
" [4. 8.]], shape=(2, 2), dtype=float64)\n"
],
"name": "stdout"
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment