jasminefrs/94-775_Recitation2.ipynb Secret

## 94-775_Recitation2.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 94-755 Rectiation 2: Numpy and Spacy Basics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Numpy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Numpy is the fundamental package for scientific computing and data analysis in python. In this course, we will use Numpy as an efficient multi-dimentional data container and perform various data wrangling and data transformation tasks with it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Numpy Installation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you have installed Anaconda, `numpy` should have been installed already. You can check your `numpy` package infomation with the command:`pip show numpy`in your Terminal. Your can also check your numpy version in a python environment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'1.14.2'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy\n",
    "numpy.__version__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you have not had `numpy` installed, you can install by `pip install numpy`.\n",
    "\n",
    "[Numpy Download Page](https://pypi.python.org/pypi/numpy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Construction of Numpy Array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 2, 3, 4])"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "# From a single list to a one dimentional numpy array\n",
    "l1 = [1,2,3,4]\n",
    "a1 = np.array(l1)\n",
    "a1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[2, 7, 1, 8],\n",
       "       [8, 4, 5, 9]])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# From a list of lists to a two dimentional numpy array\n",
    "l2 = [[2,7,1,8],[8,4,5,9]]\n",
    "a2 = np.array(l2)\n",
    "a2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(4,)\n",
      "(2, 4)\n"
     ]
    }
   ],
   "source": [
    "# Check the shape of a numpy array\n",
    "print(a1.shape)\n",
    "print(a2.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0., 0., 0., 0.],\n",
       "       [0., 0., 0., 0.],\n",
       "       [0., 0., 0., 0.]])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Array with all zeros\n",
    "np.zeros((3,4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[[1., 1., 1.],\n",
       "        [1., 1., 1.]],\n",
       "\n",
       "       [[1., 1., 1.],\n",
       "        [1., 1., 1.]],\n",
       "\n",
       "       [[1., 1., 1.],\n",
       "        [1., 1., 1.]],\n",
       "\n",
       "       [[1., 1., 1.],\n",
       "        [1., 1., 1.]]])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Array with all ones.\n",
    "np.ones([4,2,3])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([10, 15, 20, 25])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Array with a sequence of numbers\n",
    "np.arange(10, 30, 5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 1, 2, 3, 4])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.arange(5) # Shorthand for np.arange(0, 5, 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.linspace(0, 2, 9)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3],\n",
       "       [ 4,  5,  6,  7],\n",
       "       [ 8,  9, 10, 11]])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Reshape array\n",
    "a3 = np.arange(12)\n",
    "a3 = a3.reshape((3,4))\n",
    "a3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Summation of Elements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([12, 15, 18, 21])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Sum along rows\n",
    "a3.sum(axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 6, 22, 38])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Sum along columns\n",
    "a3.sum(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "66"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Sum everthing\n",
    "a3.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([4., 5., 6., 7.])"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a3.mean(axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Basic Math Operation of Arrays"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0 1 2]\n",
      " [3 4 5]]\n",
      "[[ 7  8  9]\n",
      " [10 11 12]]\n"
     ]
    }
   ],
   "source": [
    "x = np.arange(6).reshape((2,3))\n",
    "y = np.arange(7,13).reshape(2,3)\n",
    "print(x)\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 7  9 11]\n",
      " [13 15 17]]\n",
      "[[ 7  9 11]\n",
      " [13 15 17]]\n"
     ]
    }
   ],
   "source": [
    "# Element-wise add\n",
    "print(x + y)\n",
    "print(np.add(x, y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[-7 -7 -7]\n",
      " [-7 -7 -7]]\n",
      "[[-7 -7 -7]\n",
      " [-7 -7 -7]]\n"
     ]
    }
   ],
   "source": [
    "# Element-wise difference\n",
    "print(x - y)\n",
    "print(np.subtract(x, y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 0  8 18]\n",
      " [30 44 60]]\n",
      "[[ 0  8 18]\n",
      " [30 44 60]]\n"
     ]
    }
   ],
   "source": [
    "# Element-wise product\n",
    "print(x * y)\n",
    "print(np.multiply(x, y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0.         0.125      0.22222222]\n",
      " [0.3        0.36363636 0.41666667]]\n",
      "[[0.         0.125      0.22222222]\n",
      " [0.3        0.36363636 0.41666667]]\n"
     ]
    }
   ],
   "source": [
    "# Element-wise division\n",
    "print(x / y)\n",
    "print(np.divide(x, y))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Broadcasting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 2, 3, 4]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "l1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1, 2, 3, 4])"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Multiply a list by a number\n",
    "l1 * 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 3,  6,  9, 12])"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Multiply a numpy array by a number\n",
    "a1 * 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "unsupported operand type(s) for /: 'list' and 'int'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-24-5f1aa246822a>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0ml1\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for /: 'list' and 'int'"
     ]
    }
   ],
   "source": [
    "l1 / 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([4, 5, 6, 7])"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a1 + 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([-2, -1,  0,  1])"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a1 - 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.33333333, 0.66666667, 1.        , 1.33333333])"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a1 / 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 0  1  2  3]\n",
      " [ 4  5  6  7]\n",
      " [ 8  9 10 11]]\n",
      "[0 1 2 3]\n"
     ]
    }
   ],
   "source": [
    "# More complicated broadcasting behaviour\n",
    "a4 = np.arange(4)\n",
    "print(a3)\n",
    "print(a4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  4,  9],\n",
       "       [ 0,  5, 12, 21],\n",
       "       [ 0,  9, 20, 33]])"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a3 * a4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "More details can be found in a [Broadcasting Tutorial](https://docs.scipy.org/doc/numpy-dev/user/basics.broadcasting.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Fancy Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a5 = np.array([3,1,4,1,5,9,2,6,5,3])\n",
    "a5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([4, 1, 1, 3, 2, 3])"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "i = np.array([2,1,1,0,6,9])\n",
    "a5[i]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[1, 6],\n",
       "       [9, 2]])"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "j = np.array([[3,7], [5,6]])\n",
    "a5[j]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([3, 4, 5, 2, 5])"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Select numbers at odd position\n",
    "odd_i = [x for x in range(len(a5)) if x%2==0]\n",
    "a5[odd_i]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0, 2, 4, 6, 8]"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "odd_i"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([False, False, False, False,  True,  True, False,  True,  True,\n",
       "       False])"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Index with boolean array (masking)\n",
    "bool_i = a5 > 4\n",
    "bool_i"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5, 9, 6, 5])"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a5[bool_i]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5, 9, 6, 5])"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a5[a5>4]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The last plot in Lecture 3 can be done using masking\n",
    "import csv\n",
    "from datetime import datetime\n",
    "f = open('odisha-tomato-cuttack-banki.csv', 'r')\n",
    "\n",
    "reader = csv.reader(f)\n",
    "\n",
    "line_number = 0  # keep track of which line number we are at, starting from 0\n",
    "months = []\n",
    "modal_prices = []  # we build up a list of modal prices, starting from an empty list\n",
    "\n",
    "for line in reader:  # go through each line of the csv file\n",
    "    if line_number >= 2:  # note that we ignore the first two lines because they correspond to headers\n",
    "        date = datetime.strptime(line[-1], '%d-%b-%y')\n",
    "        price = float(line[-2])\n",
    "        modal_prices.append(price)\n",
    "        months.append(date.month)\n",
    "    line_number = line_number + 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "months = np.array(months, dtype=np.int)\n",
    "modal_prices = np.array(modal_prices, dtype=np.float)\n",
    "prices_by_month = [np.mean(modal_prices[months == month]) for month in range(1, 13)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1435.4166666666667"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.mean(modal_prices[months==1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SpaCy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### SpaCy Installation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Install SpaCy package:**\n",
    "\n",
    "Using pip: `$ pip install -U spacy`\n",
    "\n",
    "Using conda: `$ conda install -c conda-forge spacy`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Install SpaCy language model:**\n",
    "\n",
    "`$ python -m spacy download en` \n",
    "\n",
    "or\n",
    "\n",
    "`$ python -m spacy.en.download`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "import spacy\n",
    "nlp = spacy.load('en')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 94-755 Rectiation 2: Numpy and Spacy Basics"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Numpy"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Numpy is the fundamental package for scientific computing and data analysis in python. In this course, we will use Numpy as an efficient multi-dimentional data container and perform various data wrangling and data transformation tasks with it."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Numpy Installation"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"If you have installed Anaconda, `numpy` should have been installed already. You can check your `numpy` package infomation with the command:`pip show numpy`in your Terminal. Your can also check your numpy version in a python environment:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"'1.14.2'"
	]
	},
	"execution_count": 1,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"import numpy\n",
	"numpy.__version__"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"If you have not had `numpy` installed, you can install by `pip install numpy`.\n",
	"\n",
	"[Numpy Download Page](https://pypi.python.org/pypi/numpy)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Construction of Numpy Array"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([1, 2, 3, 4])"
	]
	},
	"execution_count": 2,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"import numpy as np\n",
	"# From a single list to a one dimentional numpy array\n",
	"l1 = [1,2,3,4]\n",
	"a1 = np.array(l1)\n",
	"a1"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[2, 7, 1, 8],\n",
	" [8, 4, 5, 9]])"
	]
	},
	"execution_count": 3,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# From a list of lists to a two dimentional numpy array\n",
	"l2 = [[2,7,1,8],[8,4,5,9]]\n",
	"a2 = np.array(l2)\n",
	"a2"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"(4,)\n",
	"(2, 4)\n"
	]
	}
	],
	"source": [
	"# Check the shape of a numpy array\n",
	"print(a1.shape)\n",
	"print(a2.shape)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[0., 0., 0., 0.],\n",
	" [0., 0., 0., 0.],\n",
	" [0., 0., 0., 0.]])"
	]
	},
	"execution_count": 5,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# Array with all zeros\n",
	"np.zeros((3,4))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[[1., 1., 1.],\n",
	" [1., 1., 1.]],\n",
	"\n",
	" [[1., 1., 1.],\n",
	" [1., 1., 1.]],\n",
	"\n",
	" [[1., 1., 1.],\n",
	" [1., 1., 1.]],\n",
	"\n",
	" [[1., 1., 1.],\n",
	" [1., 1., 1.]]])"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# Array with all ones.\n",
	"np.ones([4,2,3])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([10, 15, 20, 25])"
	]
	},
	"execution_count": 7,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# Array with a sequence of numbers\n",
	"np.arange(10, 30, 5)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([0, 1, 2, 3, 4])"
	]
	},
	"execution_count": 8,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"np.arange(5) # Shorthand for np.arange(0, 5, 1)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])"
	]
	},
	"execution_count": 9,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"np.linspace(0, 2, 9)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {
	"scrolled": true
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[ 0, 1, 2, 3],\n",
	" [ 4, 5, 6, 7],\n",
	" [ 8, 9, 10, 11]])"
	]
	},
	"execution_count": 10,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# Reshape array\n",
	"a3 = np.arange(12)\n",
	"a3 = a3.reshape((3,4))\n",
	"a3"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Summation of Elements"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([12, 15, 18, 21])"
	]
	},
	"execution_count": 11,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# Sum along rows\n",
	"a3.sum(axis=0)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([ 6, 22, 38])"
	]
	},
	"execution_count": 12,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# Sum along columns\n",
	"a3.sum(axis=1)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"metadata": {
	"scrolled": true
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"66"
	]
	},
	"execution_count": 13,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# Sum everthing\n",
	"a3.sum()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 14,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([4., 5., 6., 7.])"
	]
	},
	"execution_count": 14,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"a3.mean(axis=0)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Basic Math Operation of Arrays"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 15,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[0 1 2]\n",
	" [3 4 5]]\n",
	"[[ 7 8 9]\n",
	" [10 11 12]]\n"
	]
	}
	],
	"source": [
	"x = np.arange(6).reshape((2,3))\n",
	"y = np.arange(7,13).reshape(2,3)\n",
	"print(x)\n",
	"print(y)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 16,
	"metadata": {
	"scrolled": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[ 7 9 11]\n",
	" [13 15 17]]\n",
	"[[ 7 9 11]\n",
	" [13 15 17]]\n"
	]
	}
	],
	"source": [
	"# Element-wise add\n",
	"print(x + y)\n",
	"print(np.add(x, y))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 17,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[-7 -7 -7]\n",
	" [-7 -7 -7]]\n",
	"[[-7 -7 -7]\n",
	" [-7 -7 -7]]\n"
	]
	}
	],
	"source": [
	"# Element-wise difference\n",
	"print(x - y)\n",
	"print(np.subtract(x, y))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 18,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[ 0 8 18]\n",
	" [30 44 60]]\n",
	"[[ 0 8 18]\n",
	" [30 44 60]]\n"
	]
	}
	],
	"source": [
	"# Element-wise product\n",
	"print(x * y)\n",
	"print(np.multiply(x, y))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 19,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[0. 0.125 0.22222222]\n",
	" [0.3 0.36363636 0.41666667]]\n",
	"[[0. 0.125 0.22222222]\n",
	" [0.3 0.36363636 0.41666667]]\n"
	]
	}
	],
	"source": [
	"# Element-wise division\n",
	"print(x / y)\n",
	"print(np.divide(x, y))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Broadcasting"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 20,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"[1, 2, 3, 4]"
	]
	},
	"execution_count": 20,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"l1"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 21,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([1, 2, 3, 4])"
	]
	},
	"execution_count": 21,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"a1"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 22,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]"
	]
	},
	"execution_count": 22,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# Multiply a list by a number\n",
	"l1 * 3"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 23,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([ 3, 6, 9, 12])"
	]
	},
	"execution_count": 23,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# Multiply a numpy array by a number\n",
	"a1 * 3"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 24,
	"metadata": {},
	"outputs": [
	{
	"ename": "TypeError",
	"evalue": "unsupported operand type(s) for /: 'list' and 'int'",
	"output_type": "error",
	"traceback": [
	"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
	"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
	"\u001b[0;32m<ipython-input-24-5f1aa246822a>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0ml1\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
	"\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for /: 'list' and 'int'"
	]
	}
	],
	"source": [
	"l1 / 3"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 25,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([4, 5, 6, 7])"
	]
	},
	"execution_count": 25,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"a1 + 3"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 26,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([-2, -1, 0, 1])"
	]
	},
	"execution_count": 26,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"a1 - 3"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 27,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([0.33333333, 0.66666667, 1. , 1.33333333])"
	]
	},
	"execution_count": 27,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"a1 / 3"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 28,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[ 0 1 2 3]\n",
	" [ 4 5 6 7]\n",
	" [ 8 9 10 11]]\n",
	"[0 1 2 3]\n"
	]
	}
	],
	"source": [
	"# More complicated broadcasting behaviour\n",
	"a4 = np.arange(4)\n",
	"print(a3)\n",
	"print(a4)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 29,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[ 0, 1, 4, 9],\n",
	" [ 0, 5, 12, 21],\n",
	" [ 0, 9, 20, 33]])"
	]
	},
	"execution_count": 29,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"a3 * a4"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"More details can be found in a [Broadcasting Tutorial](https://docs.scipy.org/doc/numpy-dev/user/basics.broadcasting.html)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Fancy Index"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 30,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])"
	]
	},
	"execution_count": 30,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"a5 = np.array([3,1,4,1,5,9,2,6,5,3])\n",
	"a5"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 31,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([4, 1, 1, 3, 2, 3])"
	]
	},
	"execution_count": 31,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"i = np.array([2,1,1,0,6,9])\n",
	"a5[i]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 32,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[1, 6],\n",
	" [9, 2]])"
	]
	},
	"execution_count": 32,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"j = np.array([[3,7], [5,6]])\n",
	"a5[j]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 33,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([3, 4, 5, 2, 5])"
	]
	},
	"execution_count": 33,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# Select numbers at odd position\n",
	"odd_i = [x for x in range(len(a5)) if x%2==0]\n",
	"a5[odd_i]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 34,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"[0, 2, 4, 6, 8]"
	]
	},
	"execution_count": 34,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"odd_i"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 35,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([False, False, False, False, True, True, False, True, True,\n",
	" False])"
	]
	},
	"execution_count": 35,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# Index with boolean array (masking)\n",
	"bool_i = a5 > 4\n",
	"bool_i"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 36,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])"
	]
	},
	"execution_count": 36,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"a5"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 37,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([5, 9, 6, 5])"
	]
	},
	"execution_count": 37,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"a5[bool_i]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 38,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([5, 9, 6, 5])"
	]
	},
	"execution_count": 38,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"a5[a5>4]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 39,
	"metadata": {},
	"outputs": [],
	"source": [
	"# The last plot in Lecture 3 can be done using masking\n",
	"import csv\n",
	"from datetime import datetime\n",
	"f = open('odisha-tomato-cuttack-banki.csv', 'r')\n",
	"\n",
	"reader = csv.reader(f)\n",
	"\n",
	"line_number = 0 # keep track of which line number we are at, starting from 0\n",
	"months = []\n",
	"modal_prices = [] # we build up a list of modal prices, starting from an empty list\n",
	"\n",
	"for line in reader: # go through each line of the csv file\n",
	" if line_number >= 2: # note that we ignore the first two lines because they correspond to headers\n",
	" date = datetime.strptime(line[-1], '%d-%b-%y')\n",
	" price = float(line[-2])\n",
	" modal_prices.append(price)\n",
	" months.append(date.month)\n",
	" line_number = line_number + 1"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 40,
	"metadata": {},
	"outputs": [],
	"source": [
	"months = np.array(months, dtype=np.int)\n",
	"modal_prices = np.array(modal_prices, dtype=np.float)\n",
	"prices_by_month = [np.mean(modal_prices[months == month]) for month in range(1, 13)]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 41,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"1435.4166666666667"
	]
	},
	"execution_count": 41,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"np.mean(modal_prices[months==1])"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### SpaCy"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### SpaCy Installation"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Install SpaCy package:\n",
	"\n",
	"Using pip: `$ pip install -U spacy`\n",
	"\n",
	"Using conda: `$ conda install -c conda-forge spacy`"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Install SpaCy language model:\n",
	"\n",
	"`$ python -m spacy download en` \n",
	"\n",
	"or\n",
	"\n",
	"`$ python -m spacy.en.download`"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 42,
	"metadata": {},
	"outputs": [],
	"source": [
	"import spacy\n",
	"nlp = spacy.load('en')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.4"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}