Skip to content

Instantly share code, notes, and snippets.

@omad
Created February 27, 2017 20:31
Show Gist options
  • Save omad/cf9475cf8d06c2db429a4a678809dc42 to your computer and use it in GitHub Desktop.
Save omad/cf9475cf8d06c2db429a4a678809dc42 to your computer and use it in GitHub Desktop.
Find matching date in Pandas Dataframe.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {
"editable": true,
"deletable": true
},
"cell_type": "markdown",
"source": "# Find row in Pandas DataFrame\nThis notebook shows how to find a particular row in a column of floating point values in a Pandas DataFrame\n\nFirst we import pandas and numpy using their common abbreviations."
},
{
"metadata": {
"collapsed": false,
"ExecuteTime": {
"start_time": "2017-02-28T07:30:22.909982",
"end_time": "2017-02-28T07:30:23.779766"
},
"trusted": true,
"editable": true,
"deletable": true
},
"cell_type": "code",
"source": "import pandas as pd\nimport numpy as np",
"execution_count": 1,
"outputs": []
},
{
"metadata": {
"editable": true,
"deletable": true
},
"cell_type": "markdown",
"source": "Then set up a dataframe to test against, it will contain 2 columns. The first, __flow__, contains 100 random floating point values. The next column __date__ contains a sequential list of dates that we can attempt to search through.\n\nnb. We seed the random number generator so that it always produces the same values, so that we can hard code our search value down below."
},
{
"metadata": {
"collapsed": false,
"ExecuteTime": {
"start_time": "2017-02-28T07:30:23.781162",
"end_time": "2017-02-28T07:30:23.816882"
},
"trusted": true,
"editable": true,
"deletable": true
},
"cell_type": "code",
"source": "np.random.seed(seed=42)\nall_data = pd.DataFrame(np.random.normal(50,100,size=(100)), columns=['flow'])\nall_data['date'] = pd.date_range('1/1/2011', periods=100, freq='D')\nall_data",
"execution_count": 2,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": " flow date\n0 99.671415 2011-01-01\n1 36.173570 2011-01-02\n2 114.768854 2011-01-03\n3 202.302986 2011-01-04\n4 26.584663 2011-01-05\n5 26.586304 2011-01-06\n6 207.921282 2011-01-07\n7 126.743473 2011-01-08\n8 3.052561 2011-01-09\n9 104.256004 2011-01-10\n10 3.658231 2011-01-11\n11 3.427025 2011-01-12\n12 74.196227 2011-01-13\n13 -141.328024 2011-01-14\n14 -122.491783 2011-01-15\n15 -6.228753 2011-01-16\n16 -51.283112 2011-01-17\n17 81.424733 2011-01-18\n18 -40.802408 2011-01-19\n19 -91.230370 2011-01-20\n20 196.564877 2011-01-21\n21 27.422370 2011-01-22\n22 56.752820 2011-01-23\n23 -92.474819 2011-01-24\n24 -4.438272 2011-01-25\n25 61.092259 2011-01-26\n26 -65.099358 2011-01-27\n27 87.569802 2011-01-28\n28 -10.063869 2011-01-29\n29 20.830625 2011-01-30\n.. ... ...\n70 86.139561 2011-03-12\n71 203.803657 2011-03-13\n72 46.417396 2011-03-14\n73 206.464366 2011-03-15\n74 -211.974510 2011-03-16\n75 132.190250 2011-03-17\n76 58.704707 2011-03-18\n77 20.099265 2011-03-19\n78 59.176078 2011-03-20\n79 -148.756891 2011-03-21\n80 28.032811 2011-03-22\n81 85.711257 2011-03-23\n82 197.789404 2011-03-24\n83 -1.827022 2011-03-25\n84 -30.849360 2011-03-26\n85 -0.175704 2011-03-27\n86 141.540212 2011-03-28\n87 82.875111 2011-03-29\n88 -2.976020 2011-03-30\n89 101.326743 2011-03-31\n90 59.707755 2011-04-01\n91 146.864499 2011-04-02\n92 -20.205309 2011-04-03\n93 17.233785 2011-04-04\n94 10.789185 2011-04-05\n95 -96.351495 2011-04-06\n96 79.612028 2011-04-07\n97 76.105527 2011-04-08\n98 50.511346 2011-04-09\n99 26.541287 2011-04-10\n\n[100 rows x 2 columns]",
"text/html": "<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>flow</th>\n <th>date</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>99.671415</td>\n <td>2011-01-01</td>\n </tr>\n <tr>\n <th>1</th>\n <td>36.173570</td>\n <td>2011-01-02</td>\n </tr>\n <tr>\n <th>2</th>\n <td>114.768854</td>\n <td>2011-01-03</td>\n </tr>\n <tr>\n <th>3</th>\n <td>202.302986</td>\n <td>2011-01-04</td>\n </tr>\n <tr>\n <th>4</th>\n <td>26.584663</td>\n <td>2011-01-05</td>\n </tr>\n <tr>\n <th>5</th>\n <td>26.586304</td>\n <td>2011-01-06</td>\n </tr>\n <tr>\n <th>6</th>\n <td>207.921282</td>\n <td>2011-01-07</td>\n </tr>\n <tr>\n <th>7</th>\n <td>126.743473</td>\n <td>2011-01-08</td>\n </tr>\n <tr>\n <th>8</th>\n <td>3.052561</td>\n <td>2011-01-09</td>\n </tr>\n <tr>\n <th>9</th>\n <td>104.256004</td>\n <td>2011-01-10</td>\n </tr>\n <tr>\n <th>10</th>\n <td>3.658231</td>\n <td>2011-01-11</td>\n </tr>\n <tr>\n <th>11</th>\n <td>3.427025</td>\n <td>2011-01-12</td>\n </tr>\n <tr>\n <th>12</th>\n <td>74.196227</td>\n <td>2011-01-13</td>\n </tr>\n <tr>\n <th>13</th>\n <td>-141.328024</td>\n <td>2011-01-14</td>\n </tr>\n <tr>\n <th>14</th>\n <td>-122.491783</td>\n <td>2011-01-15</td>\n </tr>\n <tr>\n <th>15</th>\n <td>-6.228753</td>\n <td>2011-01-16</td>\n </tr>\n <tr>\n <th>16</th>\n <td>-51.283112</td>\n <td>2011-01-17</td>\n </tr>\n <tr>\n <th>17</th>\n <td>81.424733</td>\n <td>2011-01-18</td>\n </tr>\n <tr>\n <th>18</th>\n <td>-40.802408</td>\n <td>2011-01-19</td>\n </tr>\n <tr>\n <th>19</th>\n <td>-91.230370</td>\n <td>2011-01-20</td>\n </tr>\n <tr>\n <th>20</th>\n <td>196.564877</td>\n <td>2011-01-21</td>\n </tr>\n <tr>\n <th>21</th>\n <td>27.422370</td>\n <td>2011-01-22</td>\n </tr>\n <tr>\n <th>22</th>\n <td>56.752820</td>\n <td>2011-01-23</td>\n </tr>\n <tr>\n <th>23</th>\n <td>-92.474819</td>\n <td>2011-01-24</td>\n </tr>\n <tr>\n <th>24</th>\n <td>-4.438272</td>\n <td>2011-01-25</td>\n </tr>\n <tr>\n <th>25</th>\n <td>61.092259</td>\n <td>2011-01-26</td>\n </tr>\n <tr>\n <th>26</th>\n <td>-65.099358</td>\n <td>2011-01-27</td>\n </tr>\n <tr>\n <th>27</th>\n <td>87.569802</td>\n <td>2011-01-28</td>\n </tr>\n <tr>\n <th>28</th>\n <td>-10.063869</td>\n <td>2011-01-29</td>\n </tr>\n <tr>\n <th>29</th>\n <td>20.830625</td>\n <td>2011-01-30</td>\n </tr>\n <tr>\n <th>...</th>\n <td>...</td>\n <td>...</td>\n </tr>\n <tr>\n <th>70</th>\n <td>86.139561</td>\n <td>2011-03-12</td>\n </tr>\n <tr>\n <th>71</th>\n <td>203.803657</td>\n <td>2011-03-13</td>\n </tr>\n <tr>\n <th>72</th>\n <td>46.417396</td>\n <td>2011-03-14</td>\n </tr>\n <tr>\n <th>73</th>\n <td>206.464366</td>\n <td>2011-03-15</td>\n </tr>\n <tr>\n <th>74</th>\n <td>-211.974510</td>\n <td>2011-03-16</td>\n </tr>\n <tr>\n <th>75</th>\n <td>132.190250</td>\n <td>2011-03-17</td>\n </tr>\n <tr>\n <th>76</th>\n <td>58.704707</td>\n <td>2011-03-18</td>\n </tr>\n <tr>\n <th>77</th>\n <td>20.099265</td>\n <td>2011-03-19</td>\n </tr>\n <tr>\n <th>78</th>\n <td>59.176078</td>\n <td>2011-03-20</td>\n </tr>\n <tr>\n <th>79</th>\n <td>-148.756891</td>\n <td>2011-03-21</td>\n </tr>\n <tr>\n <th>80</th>\n <td>28.032811</td>\n <td>2011-03-22</td>\n </tr>\n <tr>\n <th>81</th>\n <td>85.711257</td>\n <td>2011-03-23</td>\n </tr>\n <tr>\n <th>82</th>\n <td>197.789404</td>\n <td>2011-03-24</td>\n </tr>\n <tr>\n <th>83</th>\n <td>-1.827022</td>\n <td>2011-03-25</td>\n </tr>\n <tr>\n <th>84</th>\n <td>-30.849360</td>\n <td>2011-03-26</td>\n </tr>\n <tr>\n <th>85</th>\n <td>-0.175704</td>\n <td>2011-03-27</td>\n </tr>\n <tr>\n <th>86</th>\n <td>141.540212</td>\n <td>2011-03-28</td>\n </tr>\n <tr>\n <th>87</th>\n <td>82.875111</td>\n <td>2011-03-29</td>\n </tr>\n <tr>\n <th>88</th>\n <td>-2.976020</td>\n <td>2011-03-30</td>\n </tr>\n <tr>\n <th>89</th>\n <td>101.326743</td>\n <td>2011-03-31</td>\n </tr>\n <tr>\n <th>90</th>\n <td>59.707755</td>\n <td>2011-04-01</td>\n </tr>\n <tr>\n <th>91</th>\n <td>146.864499</td>\n <td>2011-04-02</td>\n </tr>\n <tr>\n <th>92</th>\n <td>-20.205309</td>\n <td>2011-04-03</td>\n </tr>\n <tr>\n <th>93</th>\n <td>17.233785</td>\n <td>2011-04-04</td>\n </tr>\n <tr>\n <th>94</th>\n <td>10.789185</td>\n <td>2011-04-05</td>\n </tr>\n <tr>\n <th>95</th>\n <td>-96.351495</td>\n <td>2011-04-06</td>\n </tr>\n <tr>\n <th>96</th>\n <td>79.612028</td>\n <td>2011-04-07</td>\n </tr>\n <tr>\n <th>97</th>\n <td>76.105527</td>\n <td>2011-04-08</td>\n </tr>\n <tr>\n <th>98</th>\n <td>50.511346</td>\n <td>2011-04-09</td>\n </tr>\n <tr>\n <th>99</th>\n <td>26.541287</td>\n <td>2011-04-10</td>\n </tr>\n </tbody>\n</table>\n<p>100 rows × 2 columns</p>\n</div>"
},
"metadata": {},
"execution_count": 2
}
]
},
{
"metadata": {
"editable": true,
"deletable": true
},
"cell_type": "markdown",
"source": "Now we can perform our search"
},
{
"metadata": {
"collapsed": false,
"ExecuteTime": {
"start_time": "2017-02-28T07:30:23.818484",
"end_time": "2017-02-28T07:30:23.829651"
},
"trusted": true,
"editable": true,
"deletable": true
},
"cell_type": "code",
"source": "discharge = 202.302\nall_data[np.isclose(all_data.flow, discharge, 1e-3)]",
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": " flow date\n3 202.302986 2011-01-04",
"text/html": "<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>flow</th>\n <th>date</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>3</th>\n <td>202.302986</td>\n <td>2011-01-04</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 3
}
]
},
{
"metadata": {
"editable": true,
"deletable": true
},
"cell_type": "markdown",
"source": "To grab only the date we are interested in:"
},
{
"metadata": {
"collapsed": false,
"ExecuteTime": {
"start_time": "2017-02-28T07:30:23.831375",
"end_time": "2017-02-28T07:30:23.839347"
},
"trusted": true,
"editable": true,
"deletable": true
},
"cell_type": "code",
"source": "all_data[np.isclose(all_data.flow, discharge, 1e-3)].date",
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "3 2011-01-04\nName: date, dtype: datetime64[ns]"
},
"metadata": {},
"execution_count": 4
}
]
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2017-02-28T07:14:38.393336",
"start_time": "2017-02-28T07:14:38.389403"
},
"editable": true,
"deletable": true
},
"cell_type": "markdown",
"source": "We need to use the `np.isclose()` function since we are searching floating point values, and it allows us to specify a precision.\n\nFor the simpler case of searching through a list of ints, we could use something like the following. But if we try to run this, we see that no results are returned."
},
{
"metadata": {
"collapsed": false,
"ExecuteTime": {
"start_time": "2017-02-28T07:30:23.840862",
"end_time": "2017-02-28T07:30:23.850317"
},
"trusted": true,
"editable": true,
"deletable": true
},
"cell_type": "code",
"source": "all_data[all_data.flow == discharge]",
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "Empty DataFrame\nColumns: [flow, date]\nIndex: []",
"text/html": "<div>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>flow</th>\n <th>date</th>\n </tr>\n </thead>\n <tbody>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 5
}
]
},
{
"metadata": {
"editable": true,
"deletable": true
},
"cell_type": "markdown",
"source": "To show how this expression works, we can run only the inner portion of it."
},
{
"metadata": {
"collapsed": false,
"ExecuteTime": {
"start_time": "2017-02-28T07:30:23.852027",
"end_time": "2017-02-28T07:30:23.858410"
},
"trusted": true,
"editable": true,
"deletable": true
},
"cell_type": "code",
"source": "np.isclose(all_data.flow, discharge, 1e-3)",
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "array([False, False, False, True, False, False, False, False, False,\n False, False, False, False, False, False, False, False, False,\n False, False, False, False, False, False, False, False, False,\n False, False, False, False, False, False, False, False, False,\n False, False, False, False, False, False, False, False, False,\n False, False, False, False, False, False, False, False, False,\n False, False, False, False, False, False, False, False, False,\n False, False, False, False, False, False, False, False, False,\n False, False, False, False, False, False, False, False, False,\n False, False, False, False, False, False, False, False, False,\n False, False, False, False, False, False, False, False, False, False], dtype=bool)"
},
"metadata": {},
"execution_count": 6
}
]
},
{
"metadata": {
"editable": true,
"deletable": true
},
"cell_type": "markdown",
"source": "This gives us a boolean array of the same length as our DataFrame column. __False__ indicates a row that didn't match, and __True__ for any row that matches. Pandas (and numpy) allows [Boolean Indexing](http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing), which returns us only rows with a True value in our indexing array. See [Boolean Indexing](http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing) in the Pandas documentation for more information."
},
{
"metadata": {
"collapsed": true,
"trusted": true,
"editable": true,
"deletable": true
},
"cell_type": "code",
"source": "",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"name": "conda-env-damienagdc-py",
"display_name": "Python [conda env:damienagdc]",
"language": "python"
},
"hide_input": false,
"language_info": {
"name": "python",
"mimetype": "text/x-python",
"pygments_lexer": "ipython3",
"file_extension": ".py",
"nbconvert_exporter": "python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"version": "3.5.2"
},
"toc": {
"threshold": 4,
"number_sections": true,
"toc_cell": false,
"toc_window_display": false,
"toc_section_display": "block",
"sideBar": true,
"navigate_menu": true,
"nav_menu": {
"width": "252px",
"height": "30px"
}
},
"gist": {
"id": "",
"data": {
"description": "Find matching date in Pandas Dataframe.ipynb",
"public": true
}
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment