This CSV is an ETL from the Social Security Administration baby names. In their data, each year is a separate file. From years 1880 to 2015 I used only 1920 to 2000 because in general this best represents US adults that would interact with my organization.
This is a data dictionary
- name: the given name. I did not change the case or spacing.
- female_n: count of records where SEX=F
- male_n: count of records where SEX=M
- total_n: female_n + male_n
- female_p: female_n / total_n
- male_p: male_n / total_n