Skip to content

Instantly share code, notes, and snippets.

Notes

column 0 = year, column 1 = month, column 2 = day
data = pd.read_csv('./wind.data', sep='\s+', parse_dates=[[0, 1, 2]])

Notes

df.drop[['unselected_columns'], axis=1]

Notes

Calculate the ratio of male grouped by occupation

occupation_grouped = users.groupby('occupation')

# Get back a multiindex pandas' Series containing ratio 
# for each gender in each occupation

Notes

// Assume we have a 'gender' column with only M and F values
// solution 1
users['gender_n'] = [1 if gender == 'M' else 0 for gender in users['gender']]

// solution 2
users['gender_n'] = np.where(users['gender'] == 'M', 1, 0)

Notes

# rename column by passing tuples
grouped_pct.agg([('foo', 'mean'), ('bar', np.std)])
@aqd14
aqd14 / unpacking
Last active September 15, 2019 03:49
## Problem 1.1
### You have an N-element tuple or sequence that you would like to unpack into a collection of N variables.
```
p = (4, 5)
x, y = p # x = 4, y = 5
```
## Problem 1.2
#### You need to unpack N elements from an iterable, but the iterable may be longer than N elements, causing a “too many values to unpack” exception.
# Defining and initializing vectors
vector<T> v1; // vector that holds object of type T, default constructor v1 is empty
vector<T> v2 (v1); // v2 is a copy of v1
vector<T> v3(n, i); // v2 has n elements with value i
vector<T> v4(n); // v4 has n copies of a value-initialized object
college_data_processed <- college_data_processed %>%
group_by(STATENAME) %>%
mutate_at(vars(-UNITID:-REGION, -INSTNM), ~ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x))
@aqd14
aqd14 / difftime
Created April 14, 2019 15:16
Using `difftime` to calculate age
# Notes
grand_slam_ages <- player_dob %>%
select(name, date_of_birth) %>%
inner_join(grand_slams, by = "name") %>%
mutate(age = as.numeric(difftime(tournament_date, date_of_birth, unit = "days") / 365.25))
@aqd14
aqd14 / group-interaction
Created April 14, 2019 15:14
When we `fill` the plot with `gender`, it implicitly groups the data according to `gender`. Using `interaction` to tell `ggplot` to group data by both variables.
grand_slam_ages %>%
mutate(decade = 10 * (year %/% 10)) %>%
ggplot(aes(decade, age, fill = gender, group = interaction(decade, gender))) +
geom_boxplot() +
scale_x_continuous(breaks = seq(1950, 2021, 10)) +
expand_limits(x = 2020)