Skip to content

Instantly share code, notes, and snippets.

@jasminefrs
Created March 30, 2018 22:11
Show Gist options
  • Save jasminefrs/fd2fe88973b69161170b2ce9514b87bc to your computer and use it in GitHub Desktop.
Save jasminefrs/fd2fe88973b69161170b2ce9514b87bc to your computer and use it in GitHub Desktop.
94-775 Recitation 2
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 94-755 Rectiation 2: Numpy and Spacy Basics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Numpy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Numpy is the fundamental package for scientific computing and data analysis in python. In this course, we will use Numpy as an efficient multi-dimentional data container and perform various data wrangling and data transformation tasks with it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Numpy Installation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you have installed Anaconda, `numpy` should have been installed already. You can check your `numpy` package infomation with the command:`pip show numpy`in your Terminal. Your can also check your numpy version in a python environment:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'1.14.2'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy\n",
"numpy.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you have not had `numpy` installed, you can install by `pip install numpy`.\n",
"\n",
"[Numpy Download Page](https://pypi.python.org/pypi/numpy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Construction of Numpy Array"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4])"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"# From a single list to a one dimentional numpy array\n",
"l1 = [1,2,3,4]\n",
"a1 = np.array(l1)\n",
"a1"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[2, 7, 1, 8],\n",
" [8, 4, 5, 9]])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# From a list of lists to a two dimentional numpy array\n",
"l2 = [[2,7,1,8],[8,4,5,9]]\n",
"a2 = np.array(l2)\n",
"a2"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(4,)\n",
"(2, 4)\n"
]
}
],
"source": [
"# Check the shape of a numpy array\n",
"print(a1.shape)\n",
"print(a2.shape)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0., 0., 0., 0.],\n",
" [0., 0., 0., 0.],\n",
" [0., 0., 0., 0.]])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Array with all zeros\n",
"np.zeros((3,4))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[[1., 1., 1.],\n",
" [1., 1., 1.]],\n",
"\n",
" [[1., 1., 1.],\n",
" [1., 1., 1.]],\n",
"\n",
" [[1., 1., 1.],\n",
" [1., 1., 1.]],\n",
"\n",
" [[1., 1., 1.],\n",
" [1., 1., 1.]]])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Array with all ones.\n",
"np.ones([4,2,3])"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([10, 15, 20, 25])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Array with a sequence of numbers\n",
"np.arange(10, 30, 5)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0, 1, 2, 3, 4])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.arange(5) # Shorthand for np.arange(0, 5, 1)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.linspace(0, 2, 9)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0, 1, 2, 3],\n",
" [ 4, 5, 6, 7],\n",
" [ 8, 9, 10, 11]])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Reshape array\n",
"a3 = np.arange(12)\n",
"a3 = a3.reshape((3,4))\n",
"a3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Summation of Elements"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([12, 15, 18, 21])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Sum along rows\n",
"a3.sum(axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 6, 22, 38])"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Sum along columns\n",
"a3.sum(axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"66"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Sum everthing\n",
"a3.sum()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([4., 5., 6., 7.])"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a3.mean(axis=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Basic Math Operation of Arrays"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0 1 2]\n",
" [3 4 5]]\n",
"[[ 7 8 9]\n",
" [10 11 12]]\n"
]
}
],
"source": [
"x = np.arange(6).reshape((2,3))\n",
"y = np.arange(7,13).reshape(2,3)\n",
"print(x)\n",
"print(y)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 7 9 11]\n",
" [13 15 17]]\n",
"[[ 7 9 11]\n",
" [13 15 17]]\n"
]
}
],
"source": [
"# Element-wise add\n",
"print(x + y)\n",
"print(np.add(x, y))"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[-7 -7 -7]\n",
" [-7 -7 -7]]\n",
"[[-7 -7 -7]\n",
" [-7 -7 -7]]\n"
]
}
],
"source": [
"# Element-wise difference\n",
"print(x - y)\n",
"print(np.subtract(x, y))"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 0 8 18]\n",
" [30 44 60]]\n",
"[[ 0 8 18]\n",
" [30 44 60]]\n"
]
}
],
"source": [
"# Element-wise product\n",
"print(x * y)\n",
"print(np.multiply(x, y))"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0. 0.125 0.22222222]\n",
" [0.3 0.36363636 0.41666667]]\n",
"[[0. 0.125 0.22222222]\n",
" [0.3 0.36363636 0.41666667]]\n"
]
}
],
"source": [
"# Element-wise division\n",
"print(x / y)\n",
"print(np.divide(x, y))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Broadcasting"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l1"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4])"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a1"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Multiply a list by a number\n",
"l1 * 3"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 3, 6, 9, 12])"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Multiply a numpy array by a number\n",
"a1 * 3"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "unsupported operand type(s) for /: 'list' and 'int'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-24-5f1aa246822a>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0ml1\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for /: 'list' and 'int'"
]
}
],
"source": [
"l1 / 3"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([4, 5, 6, 7])"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a1 + 3"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([-2, -1, 0, 1])"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a1 - 3"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0.33333333, 0.66666667, 1. , 1.33333333])"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a1 / 3"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 0 1 2 3]\n",
" [ 4 5 6 7]\n",
" [ 8 9 10 11]]\n",
"[0 1 2 3]\n"
]
}
],
"source": [
"# More complicated broadcasting behaviour\n",
"a4 = np.arange(4)\n",
"print(a3)\n",
"print(a4)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0, 1, 4, 9],\n",
" [ 0, 5, 12, 21],\n",
" [ 0, 9, 20, 33]])"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a3 * a4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"More details can be found in a [Broadcasting Tutorial](https://docs.scipy.org/doc/numpy-dev/user/basics.broadcasting.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Fancy Index"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a5 = np.array([3,1,4,1,5,9,2,6,5,3])\n",
"a5"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([4, 1, 1, 3, 2, 3])"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"i = np.array([2,1,1,0,6,9])\n",
"a5[i]"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 6],\n",
" [9, 2]])"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"j = np.array([[3,7], [5,6]])\n",
"a5[j]"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3, 4, 5, 2, 5])"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Select numbers at odd position\n",
"odd_i = [x for x in range(len(a5)) if x%2==0]\n",
"a5[odd_i]"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0, 2, 4, 6, 8]"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"odd_i"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([False, False, False, False, True, True, False, True, True,\n",
" False])"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Index with boolean array (masking)\n",
"bool_i = a5 > 4\n",
"bool_i"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a5"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([5, 9, 6, 5])"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a5[bool_i]"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([5, 9, 6, 5])"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a5[a5>4]"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"# The last plot in Lecture 3 can be done using masking\n",
"import csv\n",
"from datetime import datetime\n",
"f = open('odisha-tomato-cuttack-banki.csv', 'r')\n",
"\n",
"reader = csv.reader(f)\n",
"\n",
"line_number = 0 # keep track of which line number we are at, starting from 0\n",
"months = []\n",
"modal_prices = [] # we build up a list of modal prices, starting from an empty list\n",
"\n",
"for line in reader: # go through each line of the csv file\n",
" if line_number >= 2: # note that we ignore the first two lines because they correspond to headers\n",
" date = datetime.strptime(line[-1], '%d-%b-%y')\n",
" price = float(line[-2])\n",
" modal_prices.append(price)\n",
" months.append(date.month)\n",
" line_number = line_number + 1"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"months = np.array(months, dtype=np.int)\n",
"modal_prices = np.array(modal_prices, dtype=np.float)\n",
"prices_by_month = [np.mean(modal_prices[months == month]) for month in range(1, 13)]"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1435.4166666666667"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.mean(modal_prices[months==1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### SpaCy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### SpaCy Installation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Install SpaCy package:**\n",
"\n",
"Using pip: `$ pip install -U spacy`\n",
"\n",
"Using conda: `$ conda install -c conda-forge spacy`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Install SpaCy language model:**\n",
"\n",
"`$ python -m spacy download en` \n",
"\n",
"or\n",
"\n",
"`$ python -m spacy.en.download`"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"import spacy\n",
"nlp = spacy.load('en')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment