rdhyee/Q_about_np.in1d.ipynb

## Q_about_np.in1d.ipynb
{
 "metadata": {
  "name": "",
  "signature": "sha256:fc2480da5342774bda4baf2e5bfd715a668b6cd427a7c8d703dcff52173012c7"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[Clarification questions about NumPy](https://bcourses.berkeley.edu/courses/1189091/discussion_topics/3039266):\n",
      "\n",
      "> After revisiting the notebooks, I ran into some things regarding NumPy that were a little unclear to me:\n",
      "\n",
      "> One notebook stated: \"NumPy array operations, such as filtering with a boolean array, scalar multiplication, or applying math functions, will preserve the index-value link.\" What does it mean by \"will preserve the index-value link\"? Do they mean that whatever transformation you apply to an array, you can still access its elements by indexing?\n",
      "\n",
      "> In Day_05_B_Geographical_Hierarchies, an alternative way of filtering Puerto Rico using np.1d was presented. I get that it functions like the \"in\" operator to test membership, but I still don't understand what this snippet of code is doing exactly:\n",
      "\n",
      ">    states_fips = np.array([state.fips for state in us.states.STATES])\n",
      ">    states_df = df[np.in1d(df.state,states_fips)]\n",
      " \n",
      "\n",
      ">On an unrelated note, how in-depth do we have to know how to set up conda in the midterm? The steps in http://rdhyee.github.io/wwod14/day03.html#(17) and http://rdhyee.github.io/wwod14/day03.html#(17) were quite detailed, and I didn't set up my environment in that fashion..."
     ]
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "About how \"preserv[ing] the index-value link\" works"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from pandas import DataFrame, Series\n",
      "import numpy as np\n",
      "import string\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's make a series of lowercae letters with the default index of 0 to 25"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "lc = Series(list(string.lowercase))\n",
      "lc.head()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 2,
       "text": [
        "0    a\n",
        "1    b\n",
        "2    c\n",
        "3    d\n",
        "4    e\n",
        "dtype: object"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You can make a new series `uc` based on applying a function to `lc`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "uc = lc.apply(string.upper)\n",
      "uc.head()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 3,
       "text": [
        "0    A\n",
        "1    B\n",
        "2    C\n",
        "3    D\n",
        "4    E\n",
        "dtype: object"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Note that index of `uc` is the has been preserved from `lc`.  Look at the index for `lc`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "lc.index"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 4,
       "text": [
        "Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], dtype='int64')"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "and look at the index for `uc`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "uc.index"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 5,
       "text": [
        "Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], dtype='int64')"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# you can check lc.index is same as uc.index\n",
      "np.all(lc.index==uc.index)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 6,
       "text": [
        "True"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "lc.ix[0]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 7,
       "text": [
        "'a'"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's check that if you apply `string.upper` to  the 11th element in `lc`, you get the 11th element of `uc`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "string.upper(lc.ix[10]) == uc.ix[10]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 8,
       "text": [
        "True"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "np.in1d"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "See whether you can make sense of this example.  `df` is a list of names with corresponding colors.  `my_colors` is a list I want to compare colors to."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "my_colors = ['red', 'green', 'blue']\n",
      "df = DataFrame({'name': ['Peter', 'Paul', 'Mary', 'Ringo'],\n",
      "                'color':['cerise', 'green', 'red', 'violet']},\n",
      "               columns=['name','color'])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 9
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>name</th>\n",
        "      <th>color</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> Peter</td>\n",
        "      <td> cerise</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>  Paul</td>\n",
        "      <td>  green</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td>  Mary</td>\n",
        "      <td>    red</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td> Ringo</td>\n",
        "      <td> violet</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>4 rows \u00d7 2 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 10,
       "text": [
        "    name   color\n",
        "0  Peter  cerise\n",
        "1   Paul   green\n",
        "2   Mary     red\n",
        "3  Ringo  violet\n",
        "\n",
        "[4 rows x 2 columns]"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now let's see which of the `df.color` is in `my_colors` by using `np.in1d` (http://docs.scipy.org/doc/numpy/reference/generated/numpy.in1d.html).  Make sure you understand the sequence of `True` and `False`.  (For each color in `df.color`, we are told whether that color is to be found in `my_colors`.)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.in1d(df.color,my_colors)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 11,
       "text": [
        "array([False,  True,  True, False], dtype=bool)"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You can now use `np.in1d(df.color,my_colors)` to filter `df` for only those rows with a color in `my_colors`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df[np.in1d(df.color,my_colors)]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>name</th>\n",
        "      <th>color</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> Paul</td>\n",
        "      <td> green</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> Mary</td>\n",
        "      <td>   red</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>2 rows \u00d7 2 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "   name  color\n",
        "1  Paul  green\n",
        "2  Mary    red\n",
        "\n",
        "[2 rows x 2 columns]"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "conda and pip"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "I'll be asking only basic, fundamental questions about pip and/or conda."
     ]
    }
   ],
   "metadata": {}
  }
 ]
}
	{
	"metadata": {
	"name": "",
	"signature": "sha256:fc2480da5342774bda4baf2e5bfd715a668b6cd427a7c8d703dcff52173012c7"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"[Clarification questions about NumPy](https://bcourses.berkeley.edu/courses/1189091/discussion_topics/3039266):\n",
	"\n",
	"> After revisiting the notebooks, I ran into some things regarding NumPy that were a little unclear to me:\n",
	"\n",
	"> One notebook stated: \"NumPy array operations, such as filtering with a boolean array, scalar multiplication, or applying math functions, will preserve the index-value link.\" What does it mean by \"will preserve the index-value link\"? Do they mean that whatever transformation you apply to an array, you can still access its elements by indexing?\n",
	"\n",
	"> In Day_05_B_Geographical_Hierarchies, an alternative way of filtering Puerto Rico using np.1d was presented. I get that it functions like the \"in\" operator to test membership, but I still don't understand what this snippet of code is doing exactly:\n",
	"\n",
	"> states_fips = np.array([state.fips for state in us.states.STATES])\n",
	"> states_df = df[np.in1d(df.state,states_fips)]\n",
	" \n",
	"\n",
	">On an unrelated note, how in-depth do we have to know how to set up conda in the midterm? The steps in http://rdhyee.github.io/wwod14/day03.html#(17) and http://rdhyee.github.io/wwod14/day03.html#(17) were quite detailed, and I didn't set up my environment in that fashion..."
	]
	},
	{
	"cell_type": "heading",
	"level": 1,
	"metadata": {},
	"source": [
	"About how \"preserv[ing] the index-value link\" works"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"from pandas import DataFrame, Series\n",
	"import numpy as np\n",
	"import string\n"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 1
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Let's make a series of lowercae letters with the default index of 0 to 25"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"lc = Series(list(string.lowercase))\n",
	"lc.head()"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 2,
	"text": [
	"0 a\n",
	"1 b\n",
	"2 c\n",
	"3 d\n",
	"4 e\n",
	"dtype: object"
	]
	}
	],
	"prompt_number": 2
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"You can make a new series `uc` based on applying a function to `lc`."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"uc = lc.apply(string.upper)\n",
	"uc.head()"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 3,
	"text": [
	"0 A\n",
	"1 B\n",
	"2 C\n",
	"3 D\n",
	"4 E\n",
	"dtype: object"
	]
	}
	],
	"prompt_number": 3
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Note that index of `uc` is the has been preserved from `lc`. Look at the index for `lc`"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"lc.index"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 4,
	"text": [
	"Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], dtype='int64')"
	]
	}
	],
	"prompt_number": 4
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"and look at the index for `uc`"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"uc.index"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 5,
	"text": [
	"Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], dtype='int64')"
	]
	}
	],
	"prompt_number": 5
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"# you can check lc.index is same as uc.index\n",
	"np.all(lc.index==uc.index)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 6,
	"text": [
	"True"
	]
	}
	],
	"prompt_number": 6
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"lc.ix[0]"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 7,
	"text": [
	"'a'"
	]
	}
	],
	"prompt_number": 7
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Let's check that if you apply `string.upper` to the 11th element in `lc`, you get the 11th element of `uc`"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"string.upper(lc.ix[10]) == uc.ix[10]"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 8,
	"text": [
	"True"
	]
	}
	],
	"prompt_number": 8
	},
	{
	"cell_type": "heading",
	"level": 1,
	"metadata": {},
	"source": [
	"np.in1d"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"See whether you can make sense of this example. `df` is a list of names with corresponding colors. `my_colors` is a list I want to compare colors to."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"my_colors = ['red', 'green', 'blue']\n",
	"df = DataFrame({'name': ['Peter', 'Paul', 'Mary', 'Ringo'],\n",
	" 'color':['cerise', 'green', 'red', 'violet']},\n",
	" columns=['name','color'])"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 9
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"df"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"html": [
	"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>name</th>\n",
	" <th>color</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>0</th>\n",
	" <td> Peter</td>\n",
	" <td> cerise</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1</th>\n",
	" <td> Paul</td>\n",
	" <td> green</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td> Mary</td>\n",
	" <td> red</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>3</th>\n",
	" <td> Ringo</td>\n",
	" <td> violet</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"<p>4 rows \u00d7 2 columns</p>\n",
	"</div>"
	],
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 10,
	"text": [
	" name color\n",
	"0 Peter cerise\n",
	"1 Paul green\n",
	"2 Mary red\n",
	"3 Ringo violet\n",
	"\n",
	"[4 rows x 2 columns]"
	]
	}
	],
	"prompt_number": 10
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now let's see which of the `df.color` is in `my_colors` by using `np.in1d` (http://docs.scipy.org/doc/numpy/reference/generated/numpy.in1d.html). Make sure you understand the sequence of `True` and `False`. (For each color in `df.color`, we are told whether that color is to be found in `my_colors`.)"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"np.in1d(df.color,my_colors)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 11,
	"text": [
	"array([False, True, True, False], dtype=bool)"
	]
	}
	],
	"prompt_number": 11
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"You can now use `np.in1d(df.color,my_colors)` to filter `df` for only those rows with a color in `my_colors`."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"df[np.in1d(df.color,my_colors)]"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"html": [
	"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>name</th>\n",
	" <th>color</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>1</th>\n",
	" <td> Paul</td>\n",
	" <td> green</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td> Mary</td>\n",
	" <td> red</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"<p>2 rows \u00d7 2 columns</p>\n",
	"</div>"
	],
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 12,
	"text": [
	" name color\n",
	"1 Paul green\n",
	"2 Mary red\n",
	"\n",
	"[2 rows x 2 columns]"
	]
	}
	],
	"prompt_number": 12
	},
	{
	"cell_type": "heading",
	"level": 1,
	"metadata": {},
	"source": [
	"conda and pip"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"I'll be asking only basic, fundamental questions about pip and/or conda."
	]
	}
	],
	"metadata": {}
	}
	]
	}