Created
March 18, 2014 15:30
-
-
Save rdhyee/9622401 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "", | |
"signature": "sha256:fc2480da5342774bda4baf2e5bfd715a668b6cd427a7c8d703dcff52173012c7" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"[Clarification questions about NumPy](https://bcourses.berkeley.edu/courses/1189091/discussion_topics/3039266):\n", | |
"\n", | |
"> After revisiting the notebooks, I ran into some things regarding NumPy that were a little unclear to me:\n", | |
"\n", | |
"> One notebook stated: \"NumPy array operations, such as filtering with a boolean array, scalar multiplication, or applying math functions, will preserve the index-value link.\" What does it mean by \"will preserve the index-value link\"? Do they mean that whatever transformation you apply to an array, you can still access its elements by indexing?\n", | |
"\n", | |
"> In Day_05_B_Geographical_Hierarchies, an alternative way of filtering Puerto Rico using np.1d was presented. I get that it functions like the \"in\" operator to test membership, but I still don't understand what this snippet of code is doing exactly:\n", | |
"\n", | |
"> states_fips = np.array([state.fips for state in us.states.STATES])\n", | |
"> states_df = df[np.in1d(df.state,states_fips)]\n", | |
" \n", | |
"\n", | |
">On an unrelated note, how in-depth do we have to know how to set up conda in the midterm? The steps in http://rdhyee.github.io/wwod14/day03.html#(17) and http://rdhyee.github.io/wwod14/day03.html#(17) were quite detailed, and I didn't set up my environment in that fashion..." | |
] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 1, | |
"metadata": {}, | |
"source": [ | |
"About how \"preserv[ing] the index-value link\" works" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"from pandas import DataFrame, Series\n", | |
"import numpy as np\n", | |
"import string\n" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's make a series of lowercae letters with the default index of 0 to 25" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"lc = Series(list(string.lowercase))\n", | |
"lc.head()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 2, | |
"text": [ | |
"0 a\n", | |
"1 b\n", | |
"2 c\n", | |
"3 d\n", | |
"4 e\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"You can make a new series `uc` based on applying a function to `lc`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"uc = lc.apply(string.upper)\n", | |
"uc.head()" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 3, | |
"text": [ | |
"0 A\n", | |
"1 B\n", | |
"2 C\n", | |
"3 D\n", | |
"4 E\n", | |
"dtype: object" | |
] | |
} | |
], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Note that index of `uc` is the has been preserved from `lc`. Look at the index for `lc`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"lc.index" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 4, | |
"text": [ | |
"Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], dtype='int64')" | |
] | |
} | |
], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"and look at the index for `uc`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"uc.index" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 5, | |
"text": [ | |
"Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], dtype='int64')" | |
] | |
} | |
], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"# you can check lc.index is same as uc.index\n", | |
"np.all(lc.index==uc.index)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 6, | |
"text": [ | |
"True" | |
] | |
} | |
], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"lc.ix[0]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 7, | |
"text": [ | |
"'a'" | |
] | |
} | |
], | |
"prompt_number": 7 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's check that if you apply `string.upper` to the 11th element in `lc`, you get the 11th element of `uc`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"string.upper(lc.ix[10]) == uc.ix[10]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 8, | |
"text": [ | |
"True" | |
] | |
} | |
], | |
"prompt_number": 8 | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 1, | |
"metadata": {}, | |
"source": [ | |
"np.in1d" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"See whether you can make sense of this example. `df` is a list of names with corresponding colors. `my_colors` is a list I want to compare colors to." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"my_colors = ['red', 'green', 'blue']\n", | |
"df = DataFrame({'name': ['Peter', 'Paul', 'Mary', 'Ringo'],\n", | |
" 'color':['cerise', 'green', 'red', 'violet']},\n", | |
" columns=['name','color'])" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 9 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>name</th>\n", | |
" <th>color</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td> Peter</td>\n", | |
" <td> cerise</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> Paul</td>\n", | |
" <td> green</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> Mary</td>\n", | |
" <td> red</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td> Ringo</td>\n", | |
" <td> violet</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>4 rows \u00d7 2 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 10, | |
"text": [ | |
" name color\n", | |
"0 Peter cerise\n", | |
"1 Paul green\n", | |
"2 Mary red\n", | |
"3 Ringo violet\n", | |
"\n", | |
"[4 rows x 2 columns]" | |
] | |
} | |
], | |
"prompt_number": 10 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now let's see which of the `df.color` is in `my_colors` by using `np.in1d` (http://docs.scipy.org/doc/numpy/reference/generated/numpy.in1d.html). Make sure you understand the sequence of `True` and `False`. (For each color in `df.color`, we are told whether that color is to be found in `my_colors`.)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"np.in1d(df.color,my_colors)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 11, | |
"text": [ | |
"array([False, True, True, False], dtype=bool)" | |
] | |
} | |
], | |
"prompt_number": 11 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"You can now use `np.in1d(df.color,my_colors)` to filter `df` for only those rows with a color in `my_colors`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"df[np.in1d(df.color,my_colors)]" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>name</th>\n", | |
" <th>color</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td> Paul</td>\n", | |
" <td> green</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> Mary</td>\n", | |
" <td> red</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"<p>2 rows \u00d7 2 columns</p>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 12, | |
"text": [ | |
" name color\n", | |
"1 Paul green\n", | |
"2 Mary red\n", | |
"\n", | |
"[2 rows x 2 columns]" | |
] | |
} | |
], | |
"prompt_number": 12 | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 1, | |
"metadata": {}, | |
"source": [ | |
"conda and pip" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"I'll be asking only basic, fundamental questions about pip and/or conda." | |
] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment