Skip to content

Instantly share code, notes, and snippets.

@onyxfish
Last active October 24, 2015 01:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save onyxfish/36f459dab02545cbdce3 to your computer and use it in GitHub Desktop.
Save onyxfish/36f459dab02545cbdce3 to your computer and use it in GitHub Desktop.
Using agate in a Jupyter notebook
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Using agate in a Jupyter notebook\n",
"\n",
"First we import agate. Then we create an agate Table by loading data from a CSV file."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<agate.table.Table object at 0x106a929b0>\n"
]
}
],
"source": [
"import agate\n",
"\n",
"table = agate.Table.from_csv('examples/realdata/ks_1033_data.csv')\n",
"\n",
"print(table)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 1: What was the total cost to Kansas City area counties?\n",
"\n",
"To answer this question, we first must filter the table to only those rows which refer to a Kansas City area `county`."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1575\n",
"15\n"
]
}
],
"source": [
"kansas_city = table.where(lambda r: r['county'] in ('JACKSON', 'CLAY', 'CASS', 'PLATTE'))\n",
"\n",
"print(len(table.rows))\n",
"print(len(kansas_city.rows))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can then print the Sum of the costs of all those rows. (The cost column is named `total_cost`.)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"$3716\n"
]
}
],
"source": [
"print('$%d' % kansas_city.columns['total_cost'].aggregate(agate.Sum()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 2: Which counties spent the most?\n",
"\n",
"This question is more complicated. First we group the data by `county`, which gives us a TableSet named `counties`. A TableSet is a group of tables with the same columns."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('ALLEN', 'ANDERSON', 'BARTON', 'BOURBON', 'BROWN', 'BUTLER', 'CHAUTAUQUA', 'CHEROKEE', 'CHEYENNE', 'CLAY', 'COFFEY', 'COMANCHE', 'COWLEY', 'CRAWFORD', 'DECATUR', 'DICKINSON', 'DOUGLAS', 'ELLIS', 'ELLSWORTH', 'FINNEY', 'FORD', 'FRANKLIN', 'GEARY', 'PRATT', 'GRAHAM', 'GRANT', 'GRAY', 'GREELEY', 'GREENWOOD', 'HAMILTON', 'HARVEY', 'HASKELL', 'JACKSON', 'JEFFERSON', 'JOHNSON', 'KIOWA', 'LABETTE', 'LEAVENWORTH', 'LINN', 'LOGAN', 'LYON', 'MARION', 'MARSHALL', 'MCPHERSON', 'MEADE', 'MIAMI', 'MONTGOMERY', 'NEMAHA', 'NEOSHO', 'NORTON', 'OSAGE', 'OTTAWA', 'PAWNEE', 'POTTAWATOMIE', 'RAWLINS', 'RENO', 'RICE', 'RILEY', 'ROOKS', 'SALINE', 'SCOTT', 'SEDGWICK', 'SHAWNEE', 'SHERMAN', 'SMITH', 'STAFFORD', 'SUMNER', 'THOMAS', 'WABAUNSEE', 'WICHITA', 'WILSON', 'WOODSON', 'WYANDOTTE')\n"
]
}
],
"source": [
"# Group by county\n",
"counties = table.group_by('county')\n",
"\n",
"print(counties.keys())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We then use the `aggregate` function to sum the `total_cost` column for each table in the group. The resulting values are collapsed into a new table, `totals`, which has a row for each county and a column named `total_cost_sum` containing the new total."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('county', 'total_cost_sum')\n"
]
}
],
"source": [
"# Aggregate totals for all counties\n",
"totals = counties.aggregate([\n",
" ('total_cost', agate.Sum(), 'total_cost_sum')\n",
"])\n",
"\n",
"print(totals.column_names)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we sort the counties by their total cost, limit the results to the top 10 and then print the results as a text bar chart."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"county total_cost_sum\n",
"SEDGWICK 977,174.45 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ \n",
"COFFEY 691,749.03 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ \n",
"MONTGOMERY 447,581.20 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ \n",
"JOHNSON 420,628.00 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ \n",
"SALINE 245,450.24 ▓░░░░░░░░░░░░░░░░░░ \n",
"FINNEY 171,862.20 ▓░░░░░░░░░░░░░ \n",
"BROWN 145,254.96 ▓░░░░░░░░░░░ \n",
"KIOWA 97,974.00 ▓░░░░░░░ \n",
"WILSON 74,747.10 ▓░░░░░ \n",
"FORD 70,780.00 ▓░░░░░ \n",
"GREENWOOD 69,722.00 ▓░░░░░ \n",
"DOUGLAS 68,069.42 ▓░░░░░ \n",
"MIAMI 64,691.09 ▓░░░░░ \n",
"LYON 51,236.00 ▓░░░░ \n",
"HAMILTON 47,989.00 ▓░░░░ \n",
"WYANDOTTE 45,259.50 ▓░░░ \n",
"CHEYENNE 35,970.92 ▓░░░ \n",
"PRATT 35,745.60 ▓░░░ \n",
"FRANKLIN 34,309.00 ▓░░░ \n",
"HARVEY 27,387.42 ▓░░ \n",
" +-----------------+-----------------+------------------+-----------------+\n",
" 0 250,000 500,000 750,000 1,000,000\n"
]
}
],
"source": [
"totals.order_by('total_cost_sum', reverse=True).limit(20).print_bars('county', 'total_cost_sum', width=100)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment