Skip to content

Instantly share code, notes, and snippets.

@rosiecakes
Last active October 29, 2016 00:16
Show Gist options
  • Save rosiecakes/852515ace00818f537c085ca9fe7fa9e to your computer and use it in GitHub Desktop.
Save rosiecakes/852515ace00818f537c085ca9fe7fa9e to your computer and use it in GitHub Desktop.
dataquest numpy intro, list comp, slicing, map
# Year -- the year the data in the row is for.
# WHO Region -- the region in which the country is located.
# Country -- the country the data is for.
# Beverage Types -- the type of beverage the data is for.
# Display Value -- the number of liters, on average, of the beverage type a citizen of the country drank in the given year.
# Use the csv module to read world_alcohol.csv into the variable world_alcohol.
# You can use the csv.reader method to accomplish this.
# world_alcohol should be a list of lists.
# Extract the first column of world_alcohol, and assign it to the variable years.
# Use list slicing to remove the first item in years (this is a header).
# Find the sum of all the items in years. Assign the result to total.
# Remember to convert each item to a float before adding them together.
# Divide total by the length of years to get the average. Assign the result to avg_year.
import csv
world_alcohol = list(csv.reader(open('world_alcohol.csv')))
years = [row[0] for row in world_alcohol][1:]
total = sum(list(map(float, years)))
avg_year = total / len(years)

Create an empty dictionary called totals.
Select only the rows in world_alcohol that match a given year. Assign the result to year.
Loop through a list of countries. For each country:
Select only the rows from year that match the given country. Assign the result to country_consumption.
Extract the fifth column from country_consumption.
Replace any empty string values in the column with the string 0.
Convert the column to the float data type.
Find the sum of the column.
Add the sum to the totals dictionary with the country name as the key.
At the end, you'll have a dictionary containing the name of each country as keys, with the associated total alcohol consumption as the values.

totals = {}
for country in countries:
# get bool vector for 1989 and the country
is_country_consumption = (world_alcohol[:,0] == '1989') & (world_alcohol[:,2] == country)
# get rows for year and country
country_consumption = world_alcohol[is_country_consumption,:]
# bool vector for countries whose last col is blank
is_empty = country_consumption[:,4] == ''
# get rows where last col is blank
empties = country_consumption[:,is_empty]
# set last col of blanks to 0
country_consumption[is_empty,4] = '0'
# convert last col to float
# country_consumption[:,4].astype(float)
# print(country_consumption[:,4].astype(float))
# sum last col
totals[country] = country_consumption[:,4].astype(float).sum()
@rosiecakes
Copy link
Author

genfromtext example
world_alcohol = numpy.genfromtxt('world_alcohol.csv', dtype='U75', skip_header=1, delimiter=',')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment