Skip to content

Instantly share code, notes, and snippets.

@dominiquesydow
Created April 20, 2022 06:52
Show Gist options
  • Save dominiquesydow/5300d3caf2605932464a30c5ef124130 to your computer and use it in GitHub Desktop.
Save dominiquesydow/5300d3caf2605932464a30c5ef124130 to your computer and use it in GitHub Desktop.
Filter rows in pandas with query()
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "e436a7b9-e07e-4ee6-81e8-64a7c91b3531",
"metadata": {},
"source": [
"# Filter rows in `pandas` with `query()`\n",
"\n",
"Cheat sheet: https://twitter.com/prthgo/status/1516036455730229255?s=20&t=pxT30xMIr_L9JNON3EYJDA"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "77051b8b-3d4a-48df-9579-4f3371c99c85",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c6e57921-7f4c-4e6d-a8b6-fa7fc90cd9eb",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c\n",
"0 1 2 4\n",
"1 1 2 1\n",
"2 4 4 1"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame(\n",
" [\n",
" [1, 2, 4], \n",
" [1, 2, 1], \n",
" [4, 4, 1]\n",
" ],\n",
" columns=[\"a\", \"b\", \"c\"]\n",
")\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "d4dbb415-a952-477c-814a-bb6b74d4965b",
"metadata": {},
"source": [
"## Conditions inside the selection brackets"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "1d689624-c1dd-44fe-a9ca-5030457b69e8",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c\n",
"1 1 2 1"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[(df[\"a\"] < 4) & (df[\"c\"] < 4)]"
]
},
{
"cell_type": "markdown",
"id": "8008d041-9775-416b-949f-2216f87c09e9",
"metadata": {},
"source": [
"## `query()` for Boolean expressions"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0df2cefc-6b04-4db1-a3e4-c6599c327a86",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c\n",
"1 1 2 1"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.query(\"a < 4 and c < 4\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
@baoilleach
Copy link

Good to know. Though I do prefer if there's just one way to do something, even if clunky. Does it still work if there's a space in the name?

@dominiquesydow
Copy link
Author

If the column name "a" was called "a a", the query would look like this:

df.query("`a a` < 4 and c < 4")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment