Skip to content

Instantly share code, notes, and snippets.

@devbkhadka
Created November 10, 2019 02:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save devbkhadka/359bd424dd0e7d0089e7f77b5022179d to your computer and use it in GitHub Desktop.
Save devbkhadka/359bd424dd0e7d0089e7f77b5022179d to your computer and use it in GitHub Desktop.
Numpy Tutorial - Vectorization, Broadcasting, Fancy Indexing
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Ulvycx2N5TQz"
},
"source": [
"This blog is meant to be **playground** for beginners to learn about numpy by trying real codes. I have kept texts content as little as possible and code examples as much as possible.\n",
"\n",
"This is also meant to be **quick future reference guide** of numpy features you already learned. The output of each cell has details describing results."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "9aLnIOLK-DA1"
},
"source": [
"### Prerequisite\n",
" - Basic programming knowledge \n",
" - Some familiarity with python (loops, arrays etc.)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "pmbT72th-ipb"
},
"source": [
"### What Will We Cover\n",
"Basics\n",
" - Creating Array\n",
" - Understanding Structure of Numpy Array (dimension, shape and strides)\n",
" - Data Types and Casting\n",
" - Indexing Methods\n",
" - Array Operations\n",
"\n",
"Advance\n",
" - Broadcasting\n",
" - Vectorization\n",
" - Ufunc and Numba"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "-vXnM1cYpdmF"
},
"source": [
"### What is Numpy?\n",
"Numpy is fundamental computing library for python. It supports N-Dimensional array and provides simple and efficient array operations. \n",
"\n",
"NumPy is library of algorithms written in the C language which stores data in a contiguous block of memory, independent of other built-in Python objects and can operate on this memory without any type checking or other python overheads. NumPy arrays also use much less memory than built-in Python sequences."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "1AMxCdwQrcB6"
},
"source": [
"### Why Use Numpy?\n",
"Python was not initially designed for numerical computation. As python is interpreted language it is inherently slower than compiled languages like C. So numpy fills that gap here, following are few advantages of using numpy \n",
"\n",
" - It provides efficient multidimensional array operations, both memory and computation wise\n",
" - It provides fast mathematical operations on entire array without need to use loop\n",
" - It also provides scientific operations related to linear algebra, statistics, Fourier transform and more \n",
" - It provides tool for interoperability with c and c++ \n",
" \n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "wDyfEeDnvYGc"
},
"source": [
"### How to Play With Numpy?\n",
"I would recommend two ways to play around with Numpy\n",
" - [kaggle](https://www.kaggle.com/notebooks) or [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb#): you can jump right into coding without needing any setup\n",
" - [Jupyter Notebook](https://jupyter.org/install): you need to install jupyter notebook and then [install numpy](https://pypi.org/project/numpy/) library using pip (numpy may be already installed if you have anaconda or miniconda)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "dXO3hwSxe4YI"
},
"source": [
"### Some Tips On Jupyter Notebook\n",
" - To see auto completion suggestions **press `tab`**\n",
" - To see parameters of a function **press `shift + tab`** after typing function name and '('. eg. type `np.asarray(` then press `shift + tab`\n",
" - To view doc string use '?' like **`np.asarray?`** then press **`shift + enter`**"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "dcTw2TldysVQ"
},
"source": [
"**Enough of reading!!**\n",
"***Lets get our hand dirty***"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "0Llyu8aq9xj_"
},
"source": [
"# Creating Array"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "DGGbioxX9lCf"
},
"source": [
"## From python list"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 151
},
"colab_type": "code",
"id": "F0CJDWAdvhks",
"outputId": "22acb356-d89f-42ac-eb44-f611a042f792"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1 2 3 4]\n",
"\n",
" array of 16 bit integers\n",
"[1 2 3 4]\n",
"\n",
" 2 dimensional array\n",
"[[1 2 3]\n",
" [4 5 6]]\n"
]
}
],
"source": [
"import numpy as np\n",
"print(np.array([1,2,3,4]))\n",
"\n",
"print('\\n', 'array of 16 bit integers')\n",
"print(np.array([1,2,3,4], dtype=np.int16))\n",
"\n",
"print('\\n', '2 dimensional array')\n",
"print(np.array([[1,2,3], [4,5,6]]))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "rcEwibuyZUr2"
},
"source": [
"## Numpy's Methods"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 470
},
"colab_type": "code",
"id": "RMz-pgrUX504",
"outputId": "9df6ca73-2f95-4d90-be60-5dc46cf62ec6"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Numpy array from range\n",
"[3 4 5 6 7]\n",
"\n",
" 2D 3X3 array of zeros\n",
"[[0. 0. 0.]\n",
" [0. 0. 0.]\n",
" [0. 0. 0.]]\n",
"\n",
" 2D 2X3 array of ones\n",
"[[1. 1. 1.]\n",
" [1. 1. 1.]]\n",
"\n",
" Triangular array with ones at and below diagonal\n",
"[[1. 0. 0. 0.]\n",
" [1. 1. 0. 0.]\n",
" [1. 1. 1. 0.]]\n",
"\n",
" Index matrix with ones at diagonal\n",
"[[1. 0. 0.]\n",
" [0. 1. 0.]\n",
" [0. 0. 1.]]\n",
"\n",
" 20 equally spaced values between 1 and 5\n",
"[1. 1.21052632 1.42105263 1.63157895 1.84210526 2.05263158\n",
" 2.26315789 2.47368421 2.68421053 2.89473684 3.10526316 3.31578947\n",
" 3.52631579 3.73684211 3.94736842 4.15789474 4.36842105 4.57894737\n",
" 4.78947368 5. ]\n"
]
}
],
"source": [
"print('Numpy array from range')\n",
"print(np.arange(3,8))\n",
"\n",
"print('\\n', '2D 3X3 array of zeros')\n",
"print(np.zeros((3,3)))\n",
"\n",
"print('\\n', '2D 2X3 array of ones')\n",
"print(np.ones((2,3)))\n",
"\n",
"print('\\n', 'Triangular array with ones at and below diagonal')\n",
"print(np.tri(3, 4))\n",
"\n",
"print('\\n', 'Index matrix with ones at diagonal')\n",
"print(np.eye(3))\n",
"\n",
"print('\\n', '20 equally spaced values between 1 and 5')\n",
"print(np.linspace(1, 5, 20))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "5vLI16SYczMj"
},
"source": [
"## Using ```np.random```"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 336
},
"colab_type": "code",
"id": "ZhqcBHldZPa9",
"outputId": "80cc8d6a-6670-40ce-ad62-d61ab921384f"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3X2 array of uniformly distributed number between 0 and 1\n",
"[[0.32564739 0.97679242]\n",
" [0.26588925 0.89020385]\n",
" [0.18024366 0.90879681]]\n",
"\n",
" Normally distributed random numbers with mean=0 and std=1\n",
"[[-1.18661884 -0.43561077 1.21316858]\n",
" [-1.47847545 0.69296328 1.01348937]\n",
" [-0.03562709 -1.90675623 0.44639003]]\n",
"\n",
" Randomly choose integers from a range (>=5, <11)\n",
"[[8 7]\n",
" [8 9]]\n",
"\n",
" Randomly selects a permutation from array\n",
"[6 4 3 2 5]\n",
"\n",
" This is equivalent to rolling dice 10 times and counting occurance of getting each side\n",
"[3 2 2 2 0 1]\n"
]
}
],
"source": [
"print('3X2 array of uniformly distributed number between 0 and 1')\n",
"print(np.random.rand(3,2))\n",
"\n",
"print('\\n', 'Normally distributed random numbers with mean=0 and std=1')\n",
"print(np.random.randn(3,3))\n",
"\n",
"print('\\n', 'Randomly choose integers from a range (>=5, <11)')\n",
"print(np.random.randint(5, 11, size=(2,2)))\n",
"\n",
"print('\\n', \"Randomly selects a permutation from array\")\n",
"print(np.random.permutation([2,3,4,5,6]))\n",
"\n",
"print('\\n', \"This is equivalent to rolling dice 10 times and counting \\\n",
"occurance of getting each side\")\n",
"print(np.random.multinomial(10, [1/6]*6))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "y8Aa7zv3k8pC"
},
"source": [
"# Understanding Structure of Numpy Array (dimension, shape and strides)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 202
},
"colab_type": "code",
"id": "omxFSedZdOpQ",
"outputId": "096c53a0-89f9-484c-abdf-6ae9e2654758"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of array dimensions\n",
"2\n",
"\n",
"Shape of array is tuple giving size of each dimension\n",
"(3, 3)\n",
"\n",
"strides gives byte steps to be moved in memory to get to next index in each dimension\n",
"(24, 8)\n",
"\n",
"Byte size of each item\n",
"8\n"
]
}
],
"source": [
"import numpy as np\n",
"arr = np.array([[1,2,3], [2,3,1], [3,3,3]])\n",
"\n",
"print('Number of array dimensions')\n",
"print(arr.ndim)\n",
"\n",
"print('\\nShape of array is tuple giving size of each dimension')\n",
"print(arr.shape)\n",
"\n",
"print('\\nstrides gives byte steps to be moved in memory to get to next \\\n",
"index in each dimension')\n",
"print(arr.strides)\n",
"\n",
"print('\\nByte size of each item')\n",
"print(arr.itemsize)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "eW9-sPQkopHy"
},
"source": [
"## More on Strides"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 218
},
"colab_type": "code",
"id": "YRf-1n7giJRM",
"outputId": "a9766c54-8c25-4a23-9ed4-176672e83f40"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Slice indexing is done by changing strides, as in examples below\n",
"Strides of original array\n",
"(24, 8)\n",
"\n",
" Slice with step of 2 is done by multiplying stride(byte step size) by 2 in that dimension\n",
"(48, 8)\n",
"\n",
" Reverse index will negate the stride\n",
"(-24, 8)\n",
"\n",
" Transpose will swap the stride of the dimensions\n",
"(8, 24)\n"
]
}
],
"source": [
"print('Slice indexing is done by changing strides, as in examples below')\n",
"\n",
"print('Strides of original array')\n",
"print(arr.strides)\n",
"\n",
"print('\\n', 'Slice with step of 2 is done by multiplying stride(byte step size) by 2 in that dimension')\n",
"print(arr[::2].strides)\n",
"\n",
"print('\\n', 'Reverse index will negate the stride')\n",
"print(arr[::-1].strides)\n",
"\n",
"print('\\n', 'Transpose will swap the stride of the dimensions')\n",
"print(arr.T.strides)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "u8_v0SEeoM-A"
},
"source": [
"### Some Stride Tricks: Inner product by changing strides\n",
"It is very rare that you may want to use these tricks but it helps us understand how indexing in numpy works\n",
"\n",
"`as_strided` function returns a view to an array with different strides and shape"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 302
},
"colab_type": "code",
"id": "GI7hX2RoiOzA",
"outputId": "b53ebd0f-a690-4692-934b-eff510acf73f"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr1: [0 1 2 3 4]\n",
"arr2: [0 1 2]\n",
"\n",
" Adding a dimension with stride 0 allows us to repeat array in that dimension without making copy\n",
"\n",
" Making stride 0 for rows repeats rows.\n",
"As step size is zero to move to next row it will give same row repeatedly\n",
"[[0 1 2 3 4]\n",
" [0 1 2 3 4]\n",
" [0 1 2 3 4]]\n",
"\n",
" Making stride 0 for columns repeats columns.\n",
"[[0 0 0 0 0]\n",
" [1 1 1 1 1]\n",
" [2 2 2 2 2]] \n",
"\n",
"Inner product: product of every value of arr1 to every value of arr2\n",
"[[0 0 0 0 0]\n",
" [0 1 2 3 4]\n",
" [0 2 4 6 8]]\n"
]
}
],
"source": [
"from numpy.lib.stride_tricks import as_strided\n",
"\n",
"arr1 = np.arange(5)\n",
"print('arr1: ', arr1)\n",
"\n",
"arr2 = np.arange(3)\n",
"print('arr2: ', arr2)\n",
"\n",
"print('\\n', 'Adding a dimension with stride 0 allows us to repeat array in that dimension without making copy')\n",
"\n",
"print('\\n', 'Making stride 0 for rows repeats rows.')\n",
"print('As step size is zero to move to next row it will give same row repeatedly')\n",
"r_arr1 = as_strided(arr1, strides=(0,arr1.itemsize), shape=(len(arr2),len(arr1)))\n",
"print(r_arr1)\n",
"\n",
"print('\\n', 'Making stride 0 for columns repeats columns.')\n",
"r_arr2 = as_strided(arr1, strides=(arr2.itemsize, 0), shape=(len(arr2),len(arr1)))\n",
"print(r_arr2, '\\n')\n",
"\n",
"print('Inner product: product of every value of arr1 to every value of arr2')\n",
"print(r_arr1 * r_arr2)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Using Broadcast"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 118
},
"colab_type": "code",
"id": "wDXLL_KbMwYF",
"outputId": "d984b02d-167a-4012-8668-4fc30dcdf3c7",
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Above example is equivalent to using broadcast to do inner product\n",
"[[0 0 0 0 0]\n",
" [0 1 2 3 4]\n",
" [0 2 4 6 8]]\n",
"arr1[np.newaxis, :].strides => (0, 8)\n",
"arr2[:, np.newaxis].strides => (8, 0)\n"
]
}
],
"source": [
"print('Above example is equivalent to using broadcast to do inner product')\n",
"print(arr1[np.newaxis, :] * arr2[:, np.newaxis])\n",
"\n",
"print('arr1[np.newaxis, :].strides => ', arr1[np.newaxis, :].strides)\n",
"print('arr2[:, np.newaxis].strides => ', arr2[:, np.newaxis].strides)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "GBMsW8DJr8JM"
},
"source": [
"# Data Types and Casting"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "a1YMYjIESfMI"
},
"source": [
"**Notes**\n",
"- Numpy array can store items of only one data type\n",
"- ```np_array.dtype``` attribute will give dtype of the array\n",
"- following table shows some common datatypes with their string names\n",
"\n",
"```\n",
"\n",
"Numpy Attribute | String Name | Description\n",
"------------------------------------------------------------------------------------------------------\n",
"np.int8, np.int16, np.int32, np.int64 | '<i1', '<i2', '<i4', '<i8' | signed int\n",
"np.uint8, np.uint16, np.uint32, np.uint64 | '<u1', '<u2', '<u4', '<u8' | unsigned int\n",
"np.float16, np.float32, np.float64, np.float128 | '<f2', '<f4', '<f8', '<f16'| floats\n",
"np.string_ |'S1', 'S10', 'S255' | string of bytes (ascii)\n",
"np.str |'U1', 'U255' | string of unicode characters\n",
"np.datetime64 |'M8' | date time\n",
"np.Object |'O' | python object\n",
"np.bool |'?' | boolean\n",
"\n",
"```\n",
"\n",
"- **Break down of string name '<u8':** here '<' means little-endian byte order, 'u' means unsigned int and '8' means 8 bytes. Other options for byte order are '>' big endian and '=' system default\n",
"- All of the array initialization functions discussed above takes 'dtype' parameter to set datatype of the array eg: ```np.random.randint(5, 11, size=(2,2), dtype=np.int8)```"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "qBjn3kuNo9rg"
},
"source": [
"## Casting "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 414
},
"colab_type": "code",
"id": "-mQVrS60SBV6",
"outputId": "cb8e3d34-0b1b-45ef-82db-48373979ef92"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr: [0. 1. 2. 3. 4.]\n",
"\n",
" Cast to integer using astype function which will make copy of the array\n"
]
},
{
"data": {
"text/plain": [
"array([0, 1, 2, 3, 4], dtype=int8)"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" By default casting is unsafe which will ignore the overflow. e.g. `2e10` is converted to 0\n",
"[0 1 2 0 4]\n",
"\n",
" Casting from string to float\n",
"['1' '2' '3' '4' '5.0']\n",
"[1. 2. 3. 4. 5.]\n",
"\n",
" Use casting=\"safe\" for doing safe casting, which will raise error if overflow\n"
]
}
],
"source": [
"import numpy as np\n",
"arr = np.arange(5, dtype='<f4')\n",
"print('arr: ', arr)\n",
"\n",
"print('\\n', 'Cast to integer using astype function which will make copy of the array')\n",
"display(arr.astype(np.int8))\n",
"\n",
"print('\\n', 'By default casting is unsafe which will ignore the overflow. e.g. `2e10` is converted to 0')\n",
"arr[3] = 2e10\n",
"print(arr.astype('<i1'))\n",
"\n",
"print('\\n', 'Casting from string to float')\n",
"sarr = np.array(\"1 2 3 4 5.0\".split())\n",
"print(sarr)\n",
"print(sarr.astype('<f4'))\n",
"\n",
"print('\\n', 'Use casting=\"safe\" for doing safe casting, which will raise error if overflow')\n",
"# print(arr.astype('<i1', casting='safe'))\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "EByx1_2oP0Op"
},
"source": [
"## Reshaping\n",
"- ndarray can be reshaped to any shape as long as total number of element are same"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 269
},
"colab_type": "code",
"id": "jyCKP4hPP3Rg",
"outputId": "92edead9-b04f-4c22-f4ea-4050470883a3"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]\n",
"\n",
" reshape 1D arr of length 20 to shape (4,5)\n",
"[[ 0 1 2 3 4]\n",
" [ 5 6 7 8 9]\n",
" [10 11 12 13 14]\n",
" [15 16 17 18 19]]\n",
"\n",
" One item of shape tuple can be -1 in which case the item will be calculated by numpy\n",
"For total size to be 20 missing value must be 5\n",
"[[[ 0 1 2 3 4]\n",
" [ 5 6 7 8 9]]\n",
"\n",
" [[10 11 12 13 14]\n",
" [15 16 17 18 19]]]\n"
]
}
],
"source": [
"arr = np.arange(20)\n",
"print('arr: ', arr)\n",
"\n",
"print('\\n', 'reshape 1D arr of length 20 to shape (4,5)')\n",
"print(arr.reshape(4,5))\n",
"\n",
"print('\\n', 'One item of shape tuple can be -1 in which case the item will be calculated by numpy')\n",
"print('For total size to be 20 missing value must be 5')\n",
"print(arr.reshape(2,2,-1))\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "jspHS4sYuvzR"
},
"source": [
"## Array View With Different dtype\n",
"- ```arr.view()``` method gives new view for same data with new dtype. Creating view with different dtype is not same as casting. eg. if we have ndarray of np.float32 ('<f4') creating view with dtype byte ('<i8') will read 4 bytes float data as individual bytes"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 202
},
"colab_type": "code",
"id": "9KmbsSPcrnUk",
"outputId": "8ed976c1-5213-4c9f-9b54-3dc148760725"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr: [0 1 2 3 4]\n",
"\n",
" View with dtype \"<i1\" for array of dtype \"<i2\" will breakdown items to bytes\n",
"[0 0 1 0 2 0 3 0 4 0]\n",
"\n",
" Changing little-endian to big-endian will change value as they use different byte order\n",
"[ 0 256 512 768 1024]\n",
"\n",
" Following will give individual bytes in memory of each items\n",
"[0. 1. 2. 3. 4.]\n",
"[ 0 0 0 60 0 64 0 66 0 68]\n"
]
}
],
"source": [
"arr = np.arange(5, dtype='<i2')\n",
"print('arr: ', arr)\n",
"\n",
"print('\\n', 'View with dtype \"<i1\" for array of dtype \"<i2\" will breakdown items to bytes')\n",
"print(arr.view('<i1'))\n",
"\n",
"print('\\n', 'Changing little-endian to big-endian will change value as they use different byte order')\n",
"print(arr.view('>i2'))\n",
"\n",
"print('\\n', 'Following will give individual bytes in memory of each items')\n",
"arr = np.arange(5, dtype='<f2')\n",
"print(arr)\n",
"print(arr.view('<i1'))"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "DChjsRw20XUn"
},
"source": [
"# Indexing Methods"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "O2GtGpihJi5y"
},
"source": [
"## Integer and Slice Indexing\n",
"- This method of indexing is similar to indexing used in python list\n",
"- Slicing always create view to the array ie. does not copy the array"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 235
},
"colab_type": "code",
"id": "yb4UtJGIJ42x",
"outputId": "6d601934-193a-43f6-bac4-21e08f195765"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]\n",
"\n",
" Get item at index 4(5th item) of the array\n",
"4\n",
"\n",
" Assign 0 to index 4 of array\n",
"[ 0 1 2 3 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]\n",
"\n",
" Get items in the index range 4 to 10 not including 10\n",
"[0 5 6 7 8 9]\n",
"\n",
" Set 1 to alternate items starting at index 4 to 10 \n",
"[ 0 1 2 3 1 5 1 7 1 9 10 11 12 13 14 15 16 17 18 19]\n"
]
}
],
"source": [
"import numpy as np\n",
"arr = np.arange(20)\n",
"print(\"arr: \", arr)\n",
"\n",
"print('\\n', 'Get item at index 4(5th item) of the array')\n",
"print(arr[4])\n",
"\n",
"print('\\n', 'Assign 0 to index 4 of array')\n",
"arr[4] = 0\n",
"print(arr)\n",
"\n",
"print('\\n', 'Get items in the index range 4 to 10 not including 10')\n",
"print(arr[4:10])\n",
"\n",
"print('\\n', 'Set 1 to alternate items starting at index 4 to 10 ')\n",
"arr[4:10:2] = 1\n",
"print(arr)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "tmi7zDcoJisx"
},
"source": [
"### Slice Indexing in 2D Array\n",
"- For multidimensional array slice index can be separated using comma"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 202
},
"colab_type": "code",
"id": "mpKVp7b1v4Ve",
"outputId": "cdbaf36e-9f25-4a58-f96a-105deaa28193"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr:\n",
" [[ 0 1 2 3 4]\n",
" [ 5 6 7 8 9]\n",
" [10 11 12 13 14]\n",
" [15 16 17 18 19]]\n",
"\n",
" Set 0 to first 3 rows and and last 2 columns\n",
"[[ 0 1 2 1 1]\n",
" [ 5 6 7 1 1]\n",
" [10 11 12 1 1]\n",
" [15 16 17 18 19]]\n"
]
}
],
"source": [
"arr = np.arange(20).reshape(4,5)\n",
"\n",
"print('arr:\\n', arr)\n",
"\n",
"print('\\n', 'Set 0 to first 3 rows and and last 2 columns')\n",
"arr[:3, -2:] = 1\n",
"print(arr)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "gF3jmEEqYcnh"
},
"source": [
"## Boolean Indexing\n",
"- Boolean array of same shape as original array (or broadcastable to the shape) can be used as index. Which will select the items where index value is true\n",
"- Boolean array can also be used to filter array with certain conditions\n",
"- Boolean indexing **will return copy** instead of view to the array"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 235
},
"colab_type": "code",
"id": "1SIAjk96W1td",
"outputId": "64ea2832-9c9b-4cc7-b76c-c7d5a022ce03"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr:\n",
" [[0 1 2]\n",
" [3 4 5]]\n",
"\n",
" Following index will gives last two items of 1st row and 1st element of 2nd row\n",
"\n",
" Boolean index to filter values greater than 3 from arr\n",
"Filter Index:\n",
" [[False False False]\n",
" [False True True]]\n",
"\n",
" Set 3 to values greater than 3 in arr\n",
"[[0 1 2]\n",
" [3 3 3]]\n"
]
}
],
"source": [
"arr = np.arange(6).reshape(2,3)\n",
"print('arr:\\n', arr)\n",
"\n",
"print('\\n', 'Following index will gives last two items of 1st row and 1st element of 2nd row')\n",
"indx = np.array([[False, True, True], [True, False,False]])\n",
"arr[indx]\n",
"\n",
"print('\\n', 'Boolean index to filter values greater than 3 from arr')\n",
"filter_indx = arr>3\n",
"print('Filter Index:\\n', filter_indx)\n",
"\n",
"print('\\n', 'Set 3 to values greater than 3 in arr')\n",
"arr[filter_indx] = 3\n",
"print(arr)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "RY1KdW6ZdTgI"
},
"source": [
"## Fancy Indexing\n",
"- Fancy Indexing means using array of index(integer) as index to get all items at once\n",
"- Fancy Indexing **will also return copy** instead of view to the array"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 336
},
"colab_type": "code",
"id": "k09lpFh5cc0n",
"outputId": "85501d16-d548-4022-aed0-f774c4e9de90"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr:\n",
" [0 1 2 3 4 5 6 7 8 9]\n",
"\n",
" Get items at indexes 3,5 and 7 at once\n",
"[3 5 7]\n",
"\n",
" Sorting arr based on another array \"values\"\n",
"values:\n",
" [0.22199317 0.87073231 0.20671916 0.91861091 0.48841119 0.61174386\n",
" 0.76590786 0.51841799 0.2968005 0.18772123]\n",
"\n",
" np.argsort instead of returning sorted values will return array of indexes which will sort the array\n",
"indexes:\n",
" [9 2 0 8 4 7 5 6 1 3]\n"
]
},
{
"ename": "NameError",
"evalue": "name 'indxes' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-14-30ac3b9b19e2>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 14\u001b[0m \u001b[0mindexes\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0margsort\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 15\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'indexes:\\n'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindexes\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 16\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Sorted array:\\n'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0marr\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mindxes\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 17\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 18\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'\\n'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'You can also use fancy indexing to get same item multiple times'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mNameError\u001b[0m: name 'indxes' is not defined"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"arr = np.arange(10)\n",
"print('arr:\\n', arr)\n",
"\n",
"print('\\n', 'Get items at indexes 3,5 and 7 at once')\n",
"print(arr[[3,5,7]])\n",
"\n",
"print('\\n', 'Sorting arr based on another array \"values\"')\n",
"np.random.seed(5)\n",
"values = np.random.rand(10)\n",
"print('values:\\n', values)\n",
"print('\\n', 'np.argsort instead of returning sorted values will return array of indexes which will sort the array')\n",
"indexes = np.argsort(values) \n",
"print('indexes:\\n', indexes)\n",
"print('Sorted array:\\n', arr[indxes])\n",
"\n",
"print('\\n', 'You can also use fancy indexing to get same item multiple times')\n",
"print(arr[[0,1,1,2,2,2,3,3,3,3]])\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "mAP1jvo4ltMI"
},
"source": [
"### Tuple Indexing\n",
"- Multi-dimensional array can be indexed using tuple of integer array of equal length, where each array in tuple will index corresponding dimension\n",
"- If number of index-array in tuple is less than the dimension of array being indexed they will be used to index lower dimension (i.e. dimension starting from 0 to length of tuple)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 235
},
"colab_type": "code",
"id": "2V4DLlD1U2m0",
"outputId": "e890da87-35b8-4871-a18b-0bb9ecd5ecaf"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr2:\n",
" [[ 0 1 2]\n",
" [ 3 4 5]\n",
" [ 6 7 8]\n",
" [ 9 10 11]\n",
" [12 13 14]]\n",
"\n",
" Will give items at index (4,0) and (1,2)\n",
"[12 5]\n",
"\n",
" Tuple of length one will return rows\n",
"[[12 13 14]\n",
" [ 3 4 5]]\n"
]
}
],
"source": [
"arr2 = np.arange(15).reshape(5,3)\n",
"print('arr2:\\n', arr2)\n",
"\n",
"print('\\n', 'Will give items at index (4,0) and (1,2)')\n",
"indx = ([4,1],[0,2])\n",
"print(arr2[indx])\n",
"\n",
"print('\\n', 'Tuple of length one will return rows')\n",
"indx = ([4,1],)\n",
"print(arr2[indx])\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "HqBun5eJsqUq"
},
"source": [
"### Assignment With Advance Indexing\n",
"- Advance Indexing(i.e. Boolean, Fancy and Tuple) returns copy instead of view to the indexed array. But direct assignment using those index will change the original array, this feature is for convenience. But if we chain the indexing it may behaves in a way which seems to be somewhat unexpected"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 185
},
"colab_type": "code",
"id": "3SqK-mCIoOdb",
"outputId": "cbf8a341-6395-41b9-fa46-b06fb37974cb"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr: [0 1 2 3 4 5 6 7 8 9]\n",
"\n",
" Direct assignment will change the original array\n",
"[ 0 1 2 -1 4 -1 6 -1 8 9]\n",
"\n",
" When we chain the indexing it will not work\n",
"[ 0 1 2 -1 4 -1 6 -1 8 9]\n",
"\n",
" But chaining index will work with slicing indexing\n",
"[ 0 1 2 -2 4 -1 6 -1 8 9]\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"arr = np.arange(10)\n",
"print('arr: ', arr)\n",
"\n",
"print('\\n', 'Direct assignment will change the original array')\n",
"arr[[3,5,7]] = -1\n",
"print(arr)\n",
"\n",
"print('\\n', 'When we chain the indexing it will not work')\n",
"arr[[3,5,7]][0] = -2\n",
"print(arr)\n",
"\n",
"print('\\n', 'But chaining index will work with slicing indexing')\n",
"arr[3:8:2][0] = -2\n",
"print(arr)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "OMh2K3gW3jUV"
},
"source": [
"## Mixed Indexing\n",
"- In multi dimensional array we can use different indexing method (slicing, boolean and fancy) for each dimension at same time\n",
"- For mixture of boolean and fancy index to work, number of True's in boolean index must be equal to length of fancy index"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 420
},
"colab_type": "code",
"id": "mqpYOao93mLW",
"outputId": "89fd63f9-7575-448a-f2e4-ce006089e128"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr: [[[ 0 1 2 3]\n",
" [ 4 5 6 7]\n",
" [ 8 9 10 11]\n",
" [12 13 14 15]]\n",
"\n",
" [[16 17 18 19]\n",
" [20 21 22 23]\n",
" [24 25 26 27]\n",
" [28 29 30 31]]\n",
"\n",
" [[32 33 34 35]\n",
" [36 37 38 39]\n",
" [40 41 42 43]\n",
" [44 45 46 47]]\n",
"\n",
" [[48 49 50 51]\n",
" [52 53 54 55]\n",
" [56 57 58 59]\n",
" [60 61 62 63]]]\n",
"\n",
" Following mixed indexing will select 1st and 3rd item in 0th dimension\n",
"and item at index 0 and 2 at 1st dimension and item at index >=2\n",
"[[ 2 3]\n",
" [42 43]]\n"
]
}
],
"source": [
"arr = np.arange(64).reshape(4,4,4)\n",
"print('arr: ', arr)\n",
"print('\\n', 'Following mixed indexing will select 1st and 3rd item in 0th dimension')\n",
"print('and item at index 0 and 2 at 1st dimension and item at index >=2')\n",
"print(arr[[True, False, True, False], [0,2], 2:])"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "FvxxsvoavgA9"
},
"source": [
"# Array Operations"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "VTznIiJ00rMA"
},
"source": [
"## Simple Array Operations\n",
"- Numpy provides simple syntax to perform mathematical and logical operations between array of compatible shapes. Here compatible shape means, shape of one of the array can be expanded to match shape of other using broadcast rule, which we'll discuss below\n",
"- For this section we'll only see two cases\n",
" - Arrays has same shape in which case operation will be element wise\n",
" - One of operand is scalar in which case operation will be done between scalar and each element of array\n",
"- These operations between arrays are called vectorization and are way faster than same operation using loop. \n",
"- Vectorization are faster because it is implemented in C and don't have overheads like type checking etc."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 672
},
"colab_type": "code",
"id": "ZFKvLV-SuCHR",
"outputId": "dbf9c71a-535d-4bfb-cc2d-6601ee88d15c"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Evaluate expression (x1*x2 - 3*x1 + 30) for x1 and x2 in range 0 to 10\n",
"[ 30. 28.69806094 27.9501385 27.75623269 28.11634349\n",
" 29.03047091 30.49861496 32.52077562 35.09695291 38.22714681\n",
" 41.91135734 46.14958449 50.94182825 56.28808864 62.18836565\n",
" 68.64265928 75.65096953 83.2132964 91.32963989 100. ]\n",
"\n",
" Spatial distance between corresponding points in two array\n",
"[0.54052263 0.17505988 0.59108818 0.41593393 0.03548522 0.29946201\n",
" 0.84649163 0.24975051 0.90016153 0.54062043 0.00097261 0.39826495\n",
" 0.64710327 0.40655563 0.00531519 0.94567232 0.33333277 0.01713418\n",
" 0.53797027 0.48080742]\n",
"\n",
" Element wise comparison, \">=\" will give boolean array with True where element\n",
"of p2 is greater than or equal to p1\n",
"[[ True False]\n",
" [False False]\n",
" [False False]\n",
" [ True True]\n",
" [ True False]\n",
" [ True True]\n",
" [ True True]\n",
" [ True False]\n",
" [ True False]\n",
" [False True]\n",
" [ True False]\n",
" [False False]\n",
" [ True False]\n",
" [ True False]\n",
" [False False]\n",
" [ True False]\n",
" [False False]\n",
" [ True False]\n",
" [False True]\n",
" [False False]]\n",
"\n",
" Element wise logical operation, \"&\" will give True where point of p2 is ahead\n",
"in both x and y direction from corresponding point in p1\n",
"[False False False True False True True False False False False False\n",
" False False False False False False False False]\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"print('Evaluate expression (x1*x2 - 3*x1 + 30) for x1 and x2 in range 0 to 10')\n",
"x1 = np.linspace(0,10,20)\n",
"x2 = np.linspace(0, 10, 20)\n",
"z = x1*x2 - 3*x1 + 30\n",
"print(z)\n",
"\n",
"print('\\n', 'Spatial distance between corresponding points in two array')\n",
"p1 = np.random.rand(20,2)\n",
"p2 = np.random.rand(20,2)\n",
"\n",
"'''np.sum will add values along given axis (dimension). If shape of array is (3,4,5)')\n",
"then axis 0,1 and 2 corresponds to dimension with length 3, 4 and 5 respectively'''\n",
"d = np.sum((p1-p2)**2, axis=1)\n",
"print(d)\n",
"\n",
"print('\\n', 'Element wise comparison, \">=\" will give boolean array with True where element')\n",
"print('of p2 is greater than or equal to p1')\n",
"r = p2>=p1\n",
"print(r)\n",
"\n",
"print('\\n', 'Element wise logical operation, \"&\" will give True where point of p2 is ahead')\n",
"print('in both x and y direction from corresponding point in p1')\n",
"print(r[:,0] & r[:,1])\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "bRoSmSanz_jN"
},
"source": [
"## Functions for Array Operations\n",
" - Numpy also has function version of above operations like `np.add, np.substract, np.divide, np.greater_equal, np.logical_and` and more\n",
" - Array operations we see in section above using operators like +, * are operator overloaded version of function operation\n",
" - Function version of the operation will give us extra parameters to customize, one of commonly used parameter is `out`. It is `None` by default, which will create a new array for result. \n",
" - If We pass an array with shape and dtype matching to expected result to `out` parameter result will be filled to the passed array. It will be efficient memory wise if we are doing multiple operations "
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 101
},
"colab_type": "code",
"id": "EzflZHCf10Cb",
"outputId": "7a52f0c2-a9c7-4da6-b519-7a20c670ee91"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Evaluate expression (x1*x2 - 3*x1 + 30) using functions\n",
"[ 30. 28.69806094 27.9501385 27.75623269 28.11634349\n",
" 29.03047091 30.49861496 32.52077562 35.09695291 38.22714681\n",
" 41.91135734 46.14958449 50.94182825 56.28808864 62.18836565\n",
" 68.64265928 75.65096953 83.2132964 91.32963989 100. ]\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"print('Evaluate expression (x1*x2 - 3*x1 + 30) using functions')\n",
"x1 = np.linspace(0,10,20)\n",
"x2 = np.linspace(0, 10, 20)\n",
"\n",
"'''Create empty output array with expected shape'''\n",
"z = np.empty_like(x1)\n",
"\n",
"'''Code is not very clean as using operator but it will perform very well memory wise'''\n",
"np.multiply(x1, x2, out=z)\n",
"np.subtract(z, 3*x1, out=z)\n",
"np.add(z, 30, out=z)\n",
"print(z)\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "4BL27ffa8fXl"
},
"source": [
"# Broadcasting"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "x5pdZG2K8k2j"
},
"source": [
"## Rules for broadcasting"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "U0NQz3GkoJZb"
},
"source": [
"When doing array operations between two array's whose shape doesn't match exactly then few simple steps are taken which changes shapes to match each other if they are compatible.\n",
"\n",
" 1. **Check Array Dimensions**: If dimensions doesn't match added to left of array with smaller dimension\n",
" 2. **Match Shape On Each Dimension**: If shape in any dimension doesn't match and shape is 1 for one of the array then repeat it to match shape of other array in that dimension\n",
" 3. **Raise Error if Dimension and Shape Not Matched**: If dimension and shape don't match till this step then raise error"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "kQk_A8Q_raAv"
},
"source": [
"### Let's Visualize the Broadcasting Rule With Custom Implementation\n",
"lets do our custom implementation to visualize in code how the broadcast rule works"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 622
},
"colab_type": "code",
"id": "beHd_CsA2lgd",
"outputId": "4e418e26-a15c-444c-d1bb-3fe9ce0c9225"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr1.shape: (10, 2)\n",
"arr2.shape: (2,)\n",
"\n",
" arr1 has dimension 2 and arr2 has dimension 1, so add 1 dimension toleft side of arr2\n",
"arr1.shape: (10, 2)\n",
"arr2.shape: (1, 2)\n",
"\n",
" Now in axis=0 arr1 has 10 items and arr2 has one item, so repeat it 10times to match arr2\n",
"arr1.shape: (10, 2)\n",
"arr2.shape: (10, 2)\n",
"\n",
" Now both array has same dimension and shape, we can multiply them\n",
"arr1*arr2:\n",
" [[ 0. 0.11111075]\n",
" [ 1.71941377 0.33333225]\n",
" [ 3.43882755 0.55555375]\n",
" [ 5.15824132 0.77777525]\n",
" [ 6.8776551 0.99999675]\n",
" [ 8.59706887 1.22221824]\n",
" [10.31648264 1.44443974]\n",
" [12.03589642 1.66666124]\n",
" [13.75531019 1.88888274]\n",
" [15.47472397 2.11110424]]\n",
"\n",
" Lets see if broadcasting also produce same result\n",
"arr1*arr3:\n",
" [[ 0. 0.11111075]\n",
" [ 1.71941377 0.33333225]\n",
" [ 3.43882755 0.55555375]\n",
" [ 5.15824132 0.77777525]\n",
" [ 6.8776551 0.99999675]\n",
" [ 8.59706887 1.22221824]\n",
" [10.31648264 1.44443974]\n",
" [12.03589642 1.66666124]\n",
" [13.75531019 1.88888274]\n",
" [15.47472397 2.11110424]]\n"
]
}
],
"source": [
"import numpy as np\n",
"\n",
"arr1 = np.arange(20).reshape(10,2)\n",
"arr2 = np.random.rand(2)\n",
"arr3 = arr2.copy()\n",
"\n",
"print('arr1.shape: ', arr1.shape)\n",
"print('arr2.shape: ', arr2.shape)\n",
"\n",
"# Step 1: Check Array Dimensions\n",
"print('\\n', 'arr1 has dimension 2 and arr2 has dimension 1, so add 1 dimension to\\\n",
"left side of arr2')\n",
"# np.newaxis is convenient way of adding new dimension\n",
"arr2 = arr2[np.newaxis, :]\n",
"print('arr1.shape: ', arr1.shape)\n",
"print('arr2.shape: ', arr2.shape)\n",
"\n",
"# Step 2: Match Shape On Each Dimension\n",
"print('\\n', 'Now in axis=0 arr1 has 10 items and arr2 has one item, so repeat it 10\\\n",
"times to match arr2')\n",
"arr2 = np.repeat(arr2, 10, axis=0)\n",
"print('arr1.shape: ', arr1.shape)\n",
"print('arr2.shape: ', arr2.shape)\n",
"\n",
"print('\\n', 'Now both array has same dimension and shape, we can multiply them')\n",
"print('arr1*arr2:\\n', arr1*arr2)\n",
"\n",
"print('\\n', 'Lets see if broadcasting also produce same result')\n",
"print('arr1*arr3:\\n', arr1*arr3)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "dCZ8tNa3DImk"
},
"source": [
"### Let's Try Few Examples With Shapes\n",
"you can try it by creating arrays of given shape and doing some operation between them\n",
"\n",
"```\n",
"Before Broadcast |Step 1 | Step 2 and 3 \n",
"Shapes of arr1 and arr2 | | Shapes of result \n",
"-------------------------------------------------------------------------\n",
"(3, 1, 5); (4, 1) | (3, 1, 5); (1, 4, 1) | (3, 4, 5) \n",
"(10,); (1, 10) | (10, 1); (1, 10) | (10, 10) \n",
"(2, 2, 2); (2, 3) | (2, 2, 2); (1, 2, 3) | Not Broadcastable\n",
"(2, 2, 2, 1); (2, 3) | (2, 2, 2, 1); (1, 1, 2, 3) | (2, 2, 2, 2, 3)\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Some Usage of Broadcast"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "VB1Vvg_dO4-l"
},
"source": [
"### Evaluate Linear Equation Using Broadcast"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 101
},
"colab_type": "code",
"id": "R7YTerz-r86c",
"outputId": "5da80c04-91b4-4669-84d2-ba01bfd41123"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Let's evaluate equation c1*x1 + c2*x2 + c3*x3 for 100 points at once\n",
"results first 10:\n",
" [ 6.35385279 0.85639146 12.87683079 5.99433896 4.50873972 10.44691041\n",
" 3.87407211 6.62954602 11.00386582 10.09247866]\n",
"results.shape: (100,)\n"
]
}
],
"source": [
"print(\"Let's evaluate equation c1*x1 + c2*x2 + c3*x3 for 100 points at once\")\n",
"points = np.random.rand(100,3)\n",
"coefficients = np.array([5, -2, 11])\n",
"results = np.sum(points*coefficients, axis=1)\n",
"print('results first 10:\\n', results[:10])\n",
"print('results.shape: ', results.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "wEc9NMVNTfMU"
},
"source": [
"### Find Common Elements Between Arrays"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 118
},
"colab_type": "code",
"id": "kCGUjN46TeSq",
"outputId": "f01a044f-06d0-4dc7-aaf9-90c4bd10755f"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arr1: [42 29 6 19 28 17 2 43 3 21 31 4 32 0 23 5 48 34 37 26]\n",
"arr2: [40 37 41 48 4 20 10 18 34 28 19 32 17 22 23]\n",
"\n",
" arr1 and arr2 are 1D arrays of length 20, 15 respectively.\n",
"To make them broadcastable Change shape of arr2 to (15, 1)\n",
"\n",
" Then both arrays will be broadcasted to (15, 20) matrix with all possible pairs\n",
"\n",
" comparison.shape: (15, 20)\n",
"\n",
" Elements of arr1 also in arr2: [19 28 17 4 32 23 48 34 37]\n"
]
}
],
"source": [
"np.random.seed(5)\n",
"## Get 20 random value from 0 to 99\n",
"arr1 = np.random.choice(50, 20, replace=False)\n",
"arr2 = np.random.choice(50, 15, replace=False)\n",
"print(\"arr1: \", arr1)\n",
"print(\"arr2: \", arr2)\n",
"print('\\n', 'arr1 and arr2 are 1D arrays of length 20, 15 respectively.')\n",
"print('To make them broadcastable Change shape of arr2 to (15, 1)')\n",
"arr2 = arr2.reshape(15, 1)\n",
"print('\\n', 'Then both arrays will be broadcasted to (15, 20) matrix with all possible pairs')\n",
"comparison = (arr1 == arr2)\n",
"print('\\n', 'comparison.shape: ', comparison.shape)\n",
"print('\\n', 'Elements of arr1 also in arr2: ', arr1[comparison.any(axis=0)])\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "rVySeIjzbNoT"
},
"source": [
"### Find k-nearest Neighbors"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"colab_type": "code",
"id": "Wu8642kUQEVe",
"outputId": "5a08bb7b-19ec-41b1-9f73-8b8a904f6e44"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"To calculate distance between every pair of points make copy of points \n",
"with shape (20, 1, 2) which will broadcast both array to shape (20, 20, 2) \n",
"\n",
"diff.shape: (20, 20, 2)\n",
"distance_matrix.shape: (20, 20) \n",
"\n",
"Get the points with it's 3 nearest neighbors\n"
]
},
{
"data": {
"text/plain": [
"array([[[0.22199317, 0.87073231],\n",
" [0.20671916, 0.91861091],\n",
" [0.16561286, 0.96393053],\n",
" [0.08074127, 0.7384403 ]],\n",
"\n",
" [[0.20671916, 0.91861091],\n",
" [0.22199317, 0.87073231],\n",
" [0.16561286, 0.96393053],\n",
" [0.08074127, 0.7384403 ]],\n",
"\n",
" [[0.48841119, 0.61174386],\n",
" [0.62878791, 0.57983781],\n",
" [0.69984361, 0.77951459],\n",
" [0.76590786, 0.51841799]],\n",
"\n",
" [[0.76590786, 0.51841799],\n",
" [0.62878791, 0.57983781],\n",
" [0.69984361, 0.77951459],\n",
" [0.87993703, 0.27408646]],\n",
"\n",
" [[0.2968005 , 0.18772123],\n",
" [0.32756395, 0.1441643 ],\n",
" [0.28468588, 0.25358821],\n",
" [0.44130922, 0.15830987]],\n",
"\n",
" [[0.08074127, 0.7384403 ],\n",
" [0.02293309, 0.57766286],\n",
" [0.22199317, 0.87073231],\n",
" [0.20671916, 0.91861091]],\n",
"\n",
" [[0.44130922, 0.15830987],\n",
" [0.32756395, 0.1441643 ],\n",
" [0.41423502, 0.29607993],\n",
" [0.2968005 , 0.18772123]],\n",
"\n",
" [[0.87993703, 0.27408646],\n",
" [0.96022672, 0.18841466],\n",
" [0.76590786, 0.51841799],\n",
" [0.5999292 , 0.26581912]],\n",
"\n",
" [[0.41423502, 0.29607993],\n",
" [0.28468588, 0.25358821],\n",
" [0.44130922, 0.15830987],\n",
" [0.2968005 , 0.18772123]],\n",
"\n",
" [[0.62878791, 0.57983781],\n",
" [0.48841119, 0.61174386],\n",
" [0.76590786, 0.51841799],\n",
" [0.69984361, 0.77951459]],\n",
"\n",
" [[0.5999292 , 0.26581912],\n",
" [0.41423502, 0.29607993],\n",
" [0.44130922, 0.15830987],\n",
" [0.87993703, 0.27408646]],\n",
"\n",
" [[0.28468588, 0.25358821],\n",
" [0.2968005 , 0.18772123],\n",
" [0.32756395, 0.1441643 ],\n",
" [0.41423502, 0.29607993]],\n",
"\n",
" [[0.32756395, 0.1441643 ],\n",
" [0.2968005 , 0.18772123],\n",
" [0.44130922, 0.15830987],\n",
" [0.28468588, 0.25358821]],\n",
"\n",
" [[0.16561286, 0.96393053],\n",
" [0.20671916, 0.91861091],\n",
" [0.22199317, 0.87073231],\n",
" [0.08074127, 0.7384403 ]],\n",
"\n",
" [[0.96022672, 0.18841466],\n",
" [0.87993703, 0.27408646],\n",
" [0.5999292 , 0.26581912],\n",
" [0.76590786, 0.51841799]],\n",
"\n",
" [[0.02430656, 0.20455555],\n",
" [0.28468588, 0.25358821],\n",
" [0.2968005 , 0.18772123],\n",
" [0.32756395, 0.1441643 ]],\n",
"\n",
" [[0.69984361, 0.77951459],\n",
" [0.62878791, 0.57983781],\n",
" [0.63979518, 0.9856244 ],\n",
" [0.76590786, 0.51841799]],\n",
"\n",
" [[0.02293309, 0.57766286],\n",
" [0.00164217, 0.51547261],\n",
" [0.08074127, 0.7384403 ],\n",
" [0.22199317, 0.87073231]],\n",
"\n",
" [[0.00164217, 0.51547261],\n",
" [0.02293309, 0.57766286],\n",
" [0.08074127, 0.7384403 ],\n",
" [0.02430656, 0.20455555]],\n",
"\n",
" [[0.63979518, 0.9856244 ],\n",
" [0.69984361, 0.77951459],\n",
" [0.48841119, 0.61174386],\n",
" [0.62878791, 0.57983781]]])"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"\n",
"np.random.seed(5)\n",
"\n",
"points = np.random.rand(20, 2)\n",
"print('To calculate distance between every pair of points make copy of points ')\n",
"print('with shape (20, 1, 2) which will broadcast both array to shape (20, 20, 2)', '\\n')\n",
"cp_points = points.reshape(20, 1, 2)\n",
"\n",
"## calculate x2-x1, y2-y1\n",
"diff = (cp_points - points)\n",
"print('diff.shape: ', diff.shape)\n",
"\n",
"## calculate (x2-x1)**2 + (y2-y1)**\n",
"distance_matrix = np.sum(diff**2, axis=2)\n",
"print('distance_matrix.shape: ', distance_matrix.shape, '\\n')\n",
"\n",
"## sort by distance along axis 1 and take top 4, one of which is the point itself\n",
"top_3 = np.argsort(distance_matrix, axis=1)[:,:4]\n",
"print(\"Get the points with it's 3 nearest neighbors\")\n",
"points[top_3]"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "s6_vQ4kbV-oQ"
},
"source": [
"# Vectorization\n",
" - In numpy vectorization means performing optimized operations on sequence of same type of data.\n",
" - In addition to having clean structure of code vectorized operations are also very performant because codes are compiled an avoids overhead of python, like type checking, memory management etc.\n",
" - The examples we see on **Broadcast section above are also good example of vectorization**"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Um9JYgwpb8Yb"
},
"source": [
"### Vectorization Vs Loop\n",
"Let's say we have a polynomial equation of degree 10 of single variable like `a1*x + a2*x^2 + a3*x^3 ... + a10*x^10`. Let's try evaluating the equation for large number of x using python only and numpy vectorization see how they compare"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "uRpoyDPRam-m"
},
"outputs": [],
"source": [
"import numpy as np\n",
"np.random.seed(32)\n",
"X = np.random.rand(10000)\n",
"coefficients = np.random.randn(10)*20 + 50\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 134
},
"colab_type": "code",
"id": "BtWYBYqyg6-n",
"outputId": "530e23c5-b927-4e28-9617-e806b7326658"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Verify that both gives same result\n",
"Loop:\n",
" [222.57782534 30.62439847 59.69953776 373.52687079 123.89007218\n",
" 179.70369976 6.49315699 321.685257 73.14575517 69.71437596]\n",
"Vectorization:\n",
" [222.57782534 30.62439847 59.69953776 373.52687079 123.89007218\n",
" 179.70369976 6.49315699 321.685257 73.14575517 69.71437596]\n"
]
}
],
"source": [
"def evaluate_polynomial_loop():\n",
" result_loop = np.empty_like(X)\n",
" for i in range(X.shape[0]):\n",
" exp_part = 1\n",
" total = 0\n",
" for j in range(coefficients.shape[0]):\n",
" exp_part *= X[i]\n",
" total+=coefficients[j]*exp_part\n",
" result_loop[i] = total\n",
" return result_loop\n",
"\n",
"\n",
"def evaluate_polynomial_vect():\n",
" ## repeates x's in 10 columns\n",
" exponents = X[:, np.newaxis] + np.zeros((1, coefficients.shape[0]))\n",
" exponents.cumprod(axis=1, out=exponents)\n",
" result_vect = np.sum(exponents * coefficients, axis=1)\n",
" return result_vect\n",
" \n",
"print('Verify that both gives same result')\n",
"print('Loop:\\n', evaluate_polynomial_loop()[:10])\n",
"print('Vectorization:\\n', evaluate_polynomial_vect()[:10])"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Or53PIVfiXhS"
},
"source": [
"### Compare\n",
"For fair comparison I used numpy array in both whose indexing is much faster than python list. By comparison we see the vectorization is **about 80 times faster**"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"id": "H5yNyPjvdS2F",
"outputId": "16aa13e8-28ef-43a8-ad42-401e4e2001de"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"114 ms ± 5.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n"
]
}
],
"source": [
"%timeit evaluate_polynomial_loop()"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"id": "EkvkaPnPfM1-",
"outputId": "43c01ec6-1e92-4b07-ad3a-c20389b6cd51"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.19 ms ± 79 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
]
}
],
"source": [
"%timeit evaluate_polynomial_vect()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "whtV9esN4cX-"
},
"source": [
"## Ufunc and Numba\n",
"**Ufunc:** Also called Universal Functions are vectorized wrapper for a function. Ufunc can operate on ndarray and support broadcasting and type casting. `np.add, np.multiply` etc are examples of Ufunc which are implemented in C. We can create custom Ufunc using `np.frompyfunc` or using numba.\n",
"\n",
"**Numba**: Numba is just in time compiler which generate optimized machine code from pure python array and numerical functions. You can use `numba.jit` decorator on a function which makes the function to be compiled on its 1st run. You can use `numba.vectorize` decorator to convert python function to Ufunc."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "q149Tutc4fhl"
},
"source": [
"Let's compare different implementations of adding two big arrays as follow"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "5O6zklZu3Uy1"
},
"source": [
"### Create Big Arrays"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "UUUP2Buh3ZDh"
},
"outputs": [],
"source": [
"arr1 = np.arange(1000000, dtype='int64')\n",
"arr2 = np.arange(1000000, dtype='int64')"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "gVT1uazl3dU6"
},
"source": [
"### Implementation Using Python Loop"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"id": "DM_ANvPJoMxM",
"outputId": "7eabcce0-fcb5-42c9-9a4d-4b0cf4db10bb"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"596 ms ± 24.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"def add_arr(arr1, arr2):\n",
" assert len(arr1)==len(arr2), \"array must have same length\"\n",
" result = np.empty_like(arr1)\n",
" for i in range(len(arr1)):\n",
" result[i] = arr1[i] + arr2[i]\n",
" return result\n",
"\n",
"%timeit _ = add_arr(arr1, arr2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "CrD4bcGn3lU0"
},
"source": [
"### Creating Ufunc Using `np.frompyfunc`"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"id": "JfulzsMhkCyk",
"outputId": "8cf1986c-2fb3-40ec-ab6d-f058cc2f46ad"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"195 ms ± 12.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"def add(a, b):\n",
" return a+b\n",
"\n",
"vect_add = np.frompyfunc(add,2,1)\n",
"\n",
"%timeit _ = vect_add(arr1, arr2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "EHUBj9mu34cW"
},
"source": [
"### Using Numba JIT\n",
"- 'nopython=True' means convert all to machine level code, if can't convert raise error"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"id": "HgBKDleRpM2U",
"outputId": "8927d7e5-a9ee-4f13-ec1c-6e46ac287dbc"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3.66 ms ± 467 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"import numba\n",
"@numba.jit(nopython=True)\n",
"def add_arr_jit(arr1, arr2):\n",
" assert len(arr1)==len(arr2), \"array must have same length\"\n",
" result = np.empty_like(arr1)\n",
" for i in range(len(arr1)):\n",
" result[i] = arr1[i] + arr2[i]\n",
" return result\n",
"\n",
"_ = add_arr_jit(arr1, arr2)\n",
"%timeit _ = add_arr_jit(arr1, arr2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "bsVkm0n43-zR"
},
"source": [
"### Creating Ufunc Using `numba.vectorize`\n",
"- 'numba.vectorize' takes signature of function being converted as parameter. 'int64(int64,int64)' means takes 2 'int64' parameters and returns 'int64'"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
},
"colab_type": "code",
"id": "ICp1ukSZiXHD",
"outputId": "5255a156-891f-4498-abd4-975c7423e6d2"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2.93 ms ± 569 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
"source": [
"import numba\n",
"\n",
"@numba.vectorize(['int64(int64,int64)'], nopython=True)\n",
"def vect_add(a, b):\n",
" return a+b\n",
"\n",
"%timeit _ = vect_add(arr1, arr2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "7jz84K_F4l91"
},
"source": [
"**Conclusion** solution using `numba.jit and numba.vectorize` are performing much better. You can also check how numpy vectorization compares with these"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab": {},
"colab_type": "code",
"id": "xCeXzySTnuyQ"
},
"source": [
"# More for Exploration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Some Useful Functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- **`np.where`:** element wise `if .. then .. Else` \n",
"- **`np.select`:** select values from multiple arrays based on multiple conditions \n",
"- **`np.concatenate, np.vstack, np.r_, np.hstack, np.c_`:** join multiple ndarray row wise, column wise or in a given axis\n",
"- **`np.ravel, np.flatten`:** converts multidimensional array to 1D array\n",
"- **`np.roll`:** do circular shift of the array along given axis\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set Operations\n",
"- **`np.unique(x)`: ** gives array of unique elements on the array\n",
"- **`Intersect1d(x, y)`: ** gives 1D array of elements common to both arrays\n",
"- **`Union1d(x, y)`: ** gives 1D array of unique elements from both arrays\n",
"- **`In1d(x, y)`: ** check if each element of x is also present on y and returns array of length equal to x with boolean values\n",
"- **`Setdiff1d(x, y)`: ** gives elements of x not in y\n",
"- **`Setxor1d(x, y)`: ** give elements that is either in x or y but not in both\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Save and Load ndarray From Disk\n",
"- **`np.save(\"filename.npy\", x)`: ** save single numpy array to disk\n",
"- **`np.load(\"filename.npy\")`: ** load single numpy array from disk\n",
"- **`np.savez(\"filename.npz\", key1=arr1, key2=arr2)`: ** saves multiple arrays with given key\n",
"- **`np.savetxt(\"filename.npy\", x)`: ** save single numpy array to disk as delimited text file\n",
"- **`np.loadtxt(\"filename.npy\")`: ** load single numpy array from text file"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Memory Mapping\n",
"To work with large numpy array that doesn't fit in RAM you can use numpy.memmap function to map the array to file in disk. It will transparently loads only segments of array needed for current operations.\n",
"\n",
"- **`np.memmap(filename, dtype, mode, shape)`: ** create a memory mapped array to a given file\n",
"- **`mmap.flush()`: ** flush all in memory changes to disk\n"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [
"DChjsRw20XUn"
],
"name": "Numpy Guide Book.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment