Skip to content

Instantly share code, notes, and snippets.

@polera
Created March 14, 2013 03:01
Show Gist options
  • Save polera/5158499 to your computer and use it in GitHub Desktop.
Save polera/5158499 to your computer and use it in GitHub Desktop.
pandas_questions.py
import pandas as pd
"""
testdata.csv looks like this:
price1,price2,price3,type
20.00,40.00,60.00,20
40.00,10.00,30.00,10
3.00,15.00,47.42,20
"""
#1. How to add a column that would be show the result of the equivalent of Excel's:
#=if(and(sum(b2:f2)<100,h2=20),"Y","N")
## Note: You'll want to create a function that performs this calculation
## that's what Excel is doing.
def determine_yes_no_col(row):
threshhold = 100
if row['type'] == 20:
# calculate sum of a range like sum(b2:f2)
# verify that it's less than your threshhold of 100
if sum(row['price1':'price3']) < threshhold:
return 'Y'
return 'N'
# Read in some data
testdata = pd.read_csv("testdata.csv")
# Add your calculated column
# Calls the determine_yes_no_col function for every Series in the pandas DataFrame
testdata['YesOrNo'] = testdata.apply(determine_yes_no_col, axis=1)
# 2. Subset the dataset say if the Column 'a' = "Y" and column 'b' ="N"
# Subset of the dataset where 'type' is 20 and 'YesOrNo' is 'N'
subset = testdata[(testdata['type']==20) & (testdata['YesOrNo'] == 'N')]
# 3. is covered in the determine_yes_no_col function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment