rosiecakes/dataquest numpy intro.py

## dataquest numpy intro.py
# Year -- the year the data in the row is for.
# WHO Region -- the region in which the country is located.
# Country -- the country the data is for.
# Beverage Types -- the type of beverage the data is for.
# Display Value -- the number of liters, on average, of the beverage type a citizen of the country drank in the given year.

# Use the csv module to read world_alcohol.csv into the variable world_alcohol.
# You can use the csv.reader method to accomplish this.
# world_alcohol should be a list of lists.
# Extract the first column of world_alcohol, and assign it to the variable years.
# Use list slicing to remove the first item in years (this is a header).
# Find the sum of all the items in years. Assign the result to total.
# Remember to convert each item to a float before adding them together.
# Divide total by the length of years to get the average. Assign the result to avg_year.

import csv
world_alcohol = list(csv.reader(open('world_alcohol.csv')))
years = [row[0] for row in world_alcohol][1:]
total = sum(list(map(float, years)))
avg_year = total / len(years)

## numpy exercise 10.md

      
    Raw
  

              numpy exercise 10.md
            
          
    Create an empty dictionary called totals.

Select only the rows in world_alcohol that match a given year. Assign the result to year.

Loop through a list of countries. For each country:

Select only the rows from year that match the given country. Assign the result to country_consumption.

Extract the fifth column from country_consumption.

Replace any empty string values in the column with the string 0.

Convert the column to the float data type.

Find the sum of the column.

Add the sum to the totals dictionary with the country name as the key.

At the end, you'll have a dictionary containing the name of each country as keys, with the associated total alcohol consumption as the values.

  
## solution 10.py
totals = {}

for country in countries:
    # get bool vector for 1989 and the country
    is_country_consumption = (world_alcohol[:,0] == '1989') & (world_alcohol[:,2] == country)

    # get rows for year and country
    country_consumption = world_alcohol[is_country_consumption,:]

    # bool vector for countries whose last col is blank
    is_empty = country_consumption[:,4] == ''

    # get rows where last col is blank
    empties = country_consumption[:,is_empty]

    # set last col of blanks to 0
    country_consumption[is_empty,4] = '0'

    # convert last col to float
    # country_consumption[:,4].astype(float)
    # print(country_consumption[:,4].astype(float))

    # sum last col
    totals[country] = country_consumption[:,4].astype(float).sum()
	# Year -- the year the data in the row is for.
	# WHO Region -- the region in which the country is located.
	# Country -- the country the data is for.
	# Beverage Types -- the type of beverage the data is for.
	# Display Value -- the number of liters, on average, of the beverage type a citizen of the country drank in the given year.

	# Use the csv module to read world_alcohol.csv into the variable world_alcohol.
	# You can use the csv.reader method to accomplish this.
	# world_alcohol should be a list of lists.
	# Extract the first column of world_alcohol, and assign it to the variable years.
	# Use list slicing to remove the first item in years (this is a header).
	# Find the sum of all the items in years. Assign the result to total.
	# Remember to convert each item to a float before adding them together.
	# Divide total by the length of years to get the average. Assign the result to avg_year.

	import csv
	world_alcohol = list(csv.reader(open('world_alcohol.csv')))
	years = [row[0] for row in world_alcohol][1:]
	total = sum(list(map(float, years)))
	avg_year = total / len(years)
	totals = {}

	for country in countries:
	# get bool vector for 1989 and the country
	is_country_consumption = (world_alcohol[:,0] == '1989') & (world_alcohol[:,2] == country)

	# get rows for year and country
	country_consumption = world_alcohol[is_country_consumption,:]

	# bool vector for countries whose last col is blank
	is_empty = country_consumption[:,4] == ''

	# get rows where last col is blank
	empties = country_consumption[:,is_empty]

	# set last col of blanks to 0
	country_consumption[is_empty,4] = '0'

	# convert last col to float
	# country_consumption[:,4].astype(float)
	# print(country_consumption[:,4].astype(float))

	# sum last col
	totals[country] = country_consumption[:,4].astype(float).sum()