Skip to content

Instantly share code, notes, and snippets.

@rdhyee
Created March 18, 2014 15:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rdhyee/9622401 to your computer and use it in GitHub Desktop.
Save rdhyee/9622401 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:fc2480da5342774bda4baf2e5bfd715a668b6cd427a7c8d703dcff52173012c7"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Clarification questions about NumPy](https://bcourses.berkeley.edu/courses/1189091/discussion_topics/3039266):\n",
"\n",
"> After revisiting the notebooks, I ran into some things regarding NumPy that were a little unclear to me:\n",
"\n",
"> One notebook stated: \"NumPy array operations, such as filtering with a boolean array, scalar multiplication, or applying math functions, will preserve the index-value link.\" What does it mean by \"will preserve the index-value link\"? Do they mean that whatever transformation you apply to an array, you can still access its elements by indexing?\n",
"\n",
"> In Day_05_B_Geographical_Hierarchies, an alternative way of filtering Puerto Rico using np.1d was presented. I get that it functions like the \"in\" operator to test membership, but I still don't understand what this snippet of code is doing exactly:\n",
"\n",
"> states_fips = np.array([state.fips for state in us.states.STATES])\n",
"> states_df = df[np.in1d(df.state,states_fips)]\n",
" \n",
"\n",
">On an unrelated note, how in-depth do we have to know how to set up conda in the midterm? The steps in http://rdhyee.github.io/wwod14/day03.html#(17) and http://rdhyee.github.io/wwod14/day03.html#(17) were quite detailed, and I didn't set up my environment in that fashion..."
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"About how \"preserv[ing] the index-value link\" works"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from pandas import DataFrame, Series\n",
"import numpy as np\n",
"import string\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's make a series of lowercae letters with the default index of 0 to 25"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"lc = Series(list(string.lowercase))\n",
"lc.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
"0 a\n",
"1 b\n",
"2 c\n",
"3 d\n",
"4 e\n",
"dtype: object"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can make a new series `uc` based on applying a function to `lc`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"uc = lc.apply(string.upper)\n",
"uc.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
"0 A\n",
"1 B\n",
"2 C\n",
"3 D\n",
"4 E\n",
"dtype: object"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that index of `uc` is the has been preserved from `lc`. Look at the index for `lc`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"lc.index"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
"Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], dtype='int64')"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"and look at the index for `uc`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"uc.index"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], dtype='int64')"
]
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# you can check lc.index is same as uc.index\n",
"np.all(lc.index==uc.index)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"True"
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"lc.ix[0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
"'a'"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's check that if you apply `string.upper` to the 11th element in `lc`, you get the 11th element of `uc`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"string.upper(lc.ix[10]) == uc.ix[10]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
"True"
]
}
],
"prompt_number": 8
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"np.in1d"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See whether you can make sense of this example. `df` is a list of names with corresponding colors. `my_colors` is a list I want to compare colors to."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"my_colors = ['red', 'green', 'blue']\n",
"df = DataFrame({'name': ['Peter', 'Paul', 'Mary', 'Ringo'],\n",
" 'color':['cerise', 'green', 'red', 'violet']},\n",
" columns=['name','color'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>color</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> Peter</td>\n",
" <td> cerise</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> Paul</td>\n",
" <td> green</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> Mary</td>\n",
" <td> red</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> Ringo</td>\n",
" <td> violet</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>4 rows \u00d7 2 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
" name color\n",
"0 Peter cerise\n",
"1 Paul green\n",
"2 Mary red\n",
"3 Ringo violet\n",
"\n",
"[4 rows x 2 columns]"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's see which of the `df.color` is in `my_colors` by using `np.in1d` (http://docs.scipy.org/doc/numpy/reference/generated/numpy.in1d.html). Make sure you understand the sequence of `True` and `False`. (For each color in `df.color`, we are told whether that color is to be found in `my_colors`.)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"np.in1d(df.color,my_colors)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"array([False, True, True, False], dtype=bool)"
]
}
],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can now use `np.in1d(df.color,my_colors)` to filter `df` for only those rows with a color in `my_colors`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df[np.in1d(df.color,my_colors)]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>color</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> Paul</td>\n",
" <td> green</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> Mary</td>\n",
" <td> red</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2 rows \u00d7 2 columns</p>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
" name color\n",
"1 Paul green\n",
"2 Mary red\n",
"\n",
"[2 rows x 2 columns]"
]
}
],
"prompt_number": 12
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"conda and pip"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I'll be asking only basic, fundamental questions about pip and/or conda."
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment