Skip to content

Instantly share code, notes, and snippets.

@lecy
Last active August 29, 2022 14:54
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save lecy/0aa782a873cd174573f32d243233ca5b to your computer and use it in GitHub Desktop.
Save lecy/0aa782a873cd174573f32d243233ca5b to your computer and use it in GitHub Desktop.
Building Census Dataset in R Using datausa.io API

Using the dataUSA.io API for Census Data in R

This gist contains some notes on constructing a query for census and economic data from the DataUSA.io site. This is a quick-start guide to their API; for in-depth documentation check out their API documentation.

A great way to learn how to structure a query is to visit a specific datausa.io page and click on the "Options" button on top of any graph, then select "API" to see the query syntax that created the graph.

Analytics

Example Use

library( dplyr )


### POPULATION OF METRO AREAS IN THE US

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=msa&required=pop&year=latest"
dat <- read.csv( api )
head( dat )
#   year     geo_name          geo    pop
# 1 2014 Aberdeen, SD 31000US10100  41564
# 2 2014 Aberdeen, WA 31000US10140  71734
# 3 2014  Abilene, TX 31000US10180 167171




### INCOME BY CENSUS TRACT

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=tract&required=income&year=latest"
api %>% read.csv() %>% head()
#   year            geo_name                geo  income
# 1 2014    Census Tract 201 14000US01001020100 60000.0
# 2 2014    Census Tract 211 14000US01001021100 34183.0
# 3 2014    Census Tract 102 14000US01003010200 31480.0



### MEDIAN PROPERTY VALUE FOR EACH COUNTY IN ATLANTA METRO AREA

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=county&required=median_property_value&geo=31000US12060&year=latest"
api %>% read.csv() %>% head()
#   year            geo_name          geo median_property_value
# 1 2014 Cherokee County, GA 05000US13057                183200
# 2 2014  Clayton County, GA 05000US13063                 85700
# 3 2014     Cobb County, GA 05000US13067                203600


### NUMBER OF EMPLOYEES BY TOP-LEVEL INDUSTRY:

api <- "http://api.datausa.io/api/csv/?show=naics,geo&sumlevel=0,nation&required=emp&year=latest"
api %>% read.csv() %>% head()
#   year      geo_name     geo                                naics_name naics     emp
# 1 2013 United States 01000US Agriculture, Forestry, Fishing, & Hunting    11  154496
# 2 2013 United States 01000US Mining, Quarrying, & Oil & Gas Extraction    21  732186
# 3 2013 United States 01000US                                 Utilities    22  638575
# 4 2013 United States 01000US                              Construction    23 5470181



### NUMBER OF BACHELOR'S DEGREES AWARDED IN 2014 BY AREA OF STUDY

api <- "http://api.datausa.io/api/csv/?show=cip&sumlevel=6&degree=5&required=grads_total&year=latest"
api %>% read.csv() %>% head()
#   year                           cip_name   cip       degree_name degree grads_total
# 1 2014                General Agriculture 10000 Bachelor's Degree      5        1924
# 2 2014     General Agriculture Management 10101 Bachelor's Degree      5        1190
# 3 2014   Agricultural Business Operations 10102 Bachelor's Degree      5        1562

Note the use of the pipe operator in these examples to keep the code short. If you are not familiar with the pipe operator, just note that these statements are equivalent:

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=msa&required=pop&year=latest"
dat <- read.csv( api )
head( dat )

library( dplyr ) # required for the pipe operator

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=msa&required=pop&year=latest"
api %>% read.csv() %>% head()

Anatomy of the API

All queries start with one of the following prefixes:

http://api.datausa.io/api/?         # returns data as json file

http://api.datausa.io/api/csv/?     # returns data as csv file

Each query consists of a set of statements, all separated by the & operator.

  • show= which type of data to query
  • sumlevel= which level or geography to aggregate over
  • required= which variable(s) to use
  • year= which period of data to use ("latest" is a valid argument)

Depending upon what type of data you query, the required and year statements may be omitted.

For example, grab the population of all counties for 2014:

# show=geo
# sumlevel=county
# required=pop
# year=2014

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=county&required=pop&year=2014"
api %>% read.csv() %>% head()
#   year                geo_name          geo     pop
# 1 2014         Cook County, IL 05000US17031 5227827
# 2 2014         King County, WA 05000US53033 2008997
# 3 2014        Wayne County, MI 05000US26163 1790078
# 4 2014      Broward County, FL 05000US12011 1815269

Populaton of all counties within a specific state:

# All counties in Massachusettes start with FIPS 05000US25

# show=geo
# sumlevel=county
# required=pop 
# year=latest 
# where=geo:^05000US25

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=county&required=pop&year=latest&where=geo:^05000US25"
api %>% read.csv() %>% head()
#   year              geo_name          geo    pop
# 1 2014 Barnstable County, MA 05000US25001 215167
# 2 2014  Berkshire County, MA 05000US25003 130064
# 3 2014    Bristol County, MA 05000US25005 55106

College degree data for all of the US:

# show=cip         data on college majors
# sumlevel=2       cip level 2 categories (see below)

api <- "http://api.datausa.io/api/csv/?show=cip&sumlevel=2&required=grads_total"
api %>% read.csv() %>% head()
#   year                         cip_name cip grads_total
# 1 2014 Philosophy and Religious Studies  38       18426
# 2 2014                     Construction  46       31596
# 3 2014          Mechanical Technologies  47      106442
# 4 2014                        Education  13      324344
# 5 2014                  Social Sciences  45      199424

Datasets and Variable

To find variables for the required= argument see these sources and their respective data dictionaries.

DATASET CONTAINS DATA DICTIONARY
ACS American Community Survey - census demographics LINK
ACS PUMS ACS Public Use Microdata - employment statistics LINK
BLS Bureau of Labor Statistics - industry and occupation LINK
CBP County Business Patterns - number of businesses LINK
CHR County Health Rankings LINK
IPEDS Integrated Post-Secondary Education Data System - data on college education LINK
ONET Skills needed for employment LINK

Data Type & Level of Aggregation

Each query requires the user to specify a type of data (show=), a level of aggregation (sumlevel=), and a variable (require=).

In this API, what dataUSA will call an "attribute" is what R users would refer to as a factor. These are variables that have levels that will be used to aggregate the outcome variables specified by require=. The primary nuance is that these data types often have nested or hierarchical levels. For example, the geo factor includes levels for states, counties, and MSAs. The sumlevel= statement allows the use to select aggregation at a specific level.

Data types are specified through the show= statement. The API recognizes all of the following:

  • Geography (geo) - typically census and economic data aggregated by geography
  • Course (cip) - employement and earnings by college and major
  • Industry (naics,bls_naics,acs_ind) - industry data aggregated by industry subsector or group
  • Occupation (soc,acs_occ) - occupation data aggregated by occupation group
  • Sex (sex) - aggregate data by sex
  • Race (acs_race,pums_race,race) - aggregate data by race - use acs_race with census data, etc.
  • Veterans (conflict) - stats on veterans
  • College Degree (degree)
  • Skill (skill)
  • Birthplace (birthplace)
  • Wage Bin (wage_bin)
  • Language (language)
  • University Location (university) and Sector - Public, State, Private (sector)

It is helpful to be able to peek inside of these data types in order to see the levels that are includes and their associated attributes. Try the following helper function:

# install.packages( "jsonlite" )
library( jsonlite )

# PRINT ALL VALID DATA (show=) TYPES:
fromJSON( "http://api.datausa.io/attrs/list/" )


viewDataLevels <- function( attribute )
{

    url <- paste( "http://api.datausa.io/attrs/", attribute, sep="" )
    temp <- fromJSON( url )
    df <- as.data.frame( temp[[1]] )
    names(df) <- temp[[2]]
    print( paste( "NUMBER OF ROWS:", nrow(df) ) )
    return(df)

}


viewDataLevels( "acs_ind" )[1:10,]  
# [1] "NUMBER OF ROWS: 191"
#      id                                                      name level
# 1    04                                              Retail Trade     0
# 2    08 Professional, Scientific, Management, Admin., Waste Mgmt.     0
# 3  0403              Building Material & Garden Equipment Dealers     1
# 4  020x                                             Manufacturing     1
# 5  030x                                           Wholesale trade     1

viewDataLevels( "university" )[1:10,]
# [1] "NUMBER OF ROWS: 9124"
#    sector                                name                     url is_stem       county     state       lat         lng     id          msa
# 1       3         South University - Richmond www.southuniversity.edu       0 05000US51087 04000US51  37.64676   -77.59915 459259 31000US40060
# 2       3          Herzing University-Atlanta         www.herzing.edu       1 05000US13121 04000US13 33.846627  -84.362635 140340 31000US12060
# 3       1              Boise State University      www.boisestate.edu       1 05000US16001 04000US16 43.604403 -116.205789 142115 31000US14260
# 4       4     Eastern Idaho Technical College            www.eitc.edu       1 05000US16019 04000US16   43.4818 -111.984645 142179 31000US26820
# 5       1              Idaho State University            www.isu.edu/       1 05000US16005 04000US16 42.864108 -112.429084 142276 31000US38540

The level of aggregation (unit of analysis) for the data is specified by the sumlevels statement. Sumlevels for the most commmon data types are listed here:

If sumlevels for a particular variable type are not listed below, try sumlevel=all.

Supported sumlevels for show=geo:

  • Nation (nation)
  • State (state) [ list ]
  • Public Use Micro Data (puma) [ list ] [ definition ]
  • County (county) [ list ]
  • Metropolitan Statistical Area (msa) [ list ] [ definition ]
  • Census Designated Place (place) [ list ] [ definition ]
  • Census tract (tract)

See below on how to query a specific geographic region.

###   income rank by state    ###
api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=state&required=income_rank&year=latest"
api %>% read.csv() %>% head()
#   year   geo_name       geo income_rank
# 1 2014    Alabama 04000US01          47
# 2 2014     Alaska 04000US02           3
# 3 2014    Arizona 04000US04          31
# 4 2014   Arkansas 04000US05          50
# 5 2014 California 04000US06          10
# 6 2014   Colorado 04000US08          15


###   income rank by city   ###
api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=msa&required=income_rank&year=latest"
api %>% read.csv() %>% head()
#   year     geo_name          geo income_rank
# 1 2014 Aberdeen, SD 31000US10100         193
# 2 2014 Aberdeen, WA 31000US10140         565
# 3 2014  Abilene, TX 31000US10180         530
# 4 2014      Ada, OK 31000US10220         600
# 5 2014 Adjuntas, PR 31000US10260         929
# 6 2014   Adrian, MI 31000US10300         369

Supported sumlevels for show=cip: [ link to codes ]

  • 2-digit (In ACS PUMS data only 2-digit data is available.)
  • 4-digit
  • 6-digit

Example of sumlevel=2, sumlevel=4, and sumlevel=6 results:

Close11) COMPUTER AND INFORMATION SCIENCES AND SUPPORT SERVICES

  Close11.01) Computer and Information Sciences, General
  
    11.0101) Computer and Information Sciences, General
    11.0102) Artificial Intelligence
    11.0103) Information Technology
    11.0104) Informatics
    11.0199) Computer and Information Sciences, Other

  Close11.02) Computer Programming
  
    11.0201) Computer Programming/Programmer, General
    11.0202) Computer Programming, Specific Applications
    11.0203) Computer Programming, Vendor/Product Certification
    11.0299) Computer Programming, Other
    

Supported sumlevels for show=naics: [ link to codes ]

  • Sector (0)
  • Sub Sector (1)
  • Group (2)

Example industry at sumlevel=0, sumlevel=1, and sumlevel=2.

Financial Activities

  Finance and Insurance (NAICS 52)
  
    Monetary Authorities - Central Bank (NAICS 521)
    Credit Intermediation and Related Activities (NAICS 522)
    Insurance Carriers and Related Activities (NAICS 524)
    Funds, Trusts, and Other Financial Vehicles (NAICS 525)

library( dplyr )

# NUMBER OF EMPLOYEES BY INDUSTRY (IN THOUSANDS)

api <- "http://api.datausa.io/api/csv/?show=naics&sumlevel=all&required=emp_2014_thousands"
api %>% read.csv() %>% head()
#                                      naics_name  naics emp_2014_thousands
# 1 Mining, Quarrying, and Oil and Gas Extraction     21              843.8
# 2                        Oil and Gas Extraction    211              197.9
# 3                   Mining (except Oil and Gas)    212              207.4

# ESTABLISHMENTS BY INDUSTRY WITHIN MSAs

api <- "http://api.datausa.io/api/csv/?show=geo,naics&sumlevel=msa,1&required=est"
api %>% read.csv() %>% head()
#   year     geo_name          geo                     naics_name naics est
# 1 2013 Escanaba, MI 31000US21540         Couriers & messengers    492   4
# 2 2013 Escanaba, MI 31000US21540         Warehousing & storage    493   1
# 3 2013 Escanaba, MI 31000US21540                           None   511   3

Supported sumlevels for show=soc: [ link to codes ]

  • Major Group (0)
  • Minor Group (1)
  • Broad Occupation (2)
  • Detailed Occupation (3)

IMPORTANT: The allowable sumlevels will depend entirely upon the underlying data table that is being queried. In order to determine capatability of your show= & sumlevel= arguments, you need to check. Currently you can search by the data type only. The function below will print out all compatible data tables and their joint allowable sumlevels.

# install.packages( "jsonlite" )

getUsage <- function( attribute, return.df=F )
{
    library( jsonlite )
    url <- paste( "http://api.datausa.io/api/logic/?show=", attribute, "&sumlevel=all", sep="" )
    temp <- fromJSON( url, simplifyDataFrame=FALSE )

    for( i in 1:length(temp$tables) )
    {
       cat( paste( "TABLE:", temp$tables[[i]]$table, "\n" ) )
       cat( paste( "DATA SOURCE:", temp$tables[[i]]$dataset, "\n" ) )
       cat( paste( "DEPARTMENT:", temp$tables[[i]]$org, "\n" ) )
       cat( paste( "LINK:", temp$tables[[i]]$link, "\n\n" ) )
       cat( "SUPPORTED SUMLEVELS:", "\n\n" )
       for( j in 1:length(temp$tables[[i]]$supported_levels) )
       {
            cat( paste( names(temp$tables[[i]]$supported_levels[j]), ":  ",
                         paste( unlist(temp$tables[[i]]$supported_levels[j]), collapse=", "), "\n", sep="" ) )
       }
       cat("\n \n \n \n \n")
    }

    if( return.df==T )
    {
      temp <- fromJSON( url, simplifyDataFrame=TRUE )
      df <- as.data.frame( temp[[1]] )
      return(df)
    }
    else{ return(NULL) }

}



getUsage( "geo" )
# TABLE: ygi_num_emp 
# DATA SOURCE: ACS 3-year Estimate 
# DEPARTMENT: Census Bureau 
# LINK: http://www.census.gov/programs-surveys/acs/ 
# 
# SUPPORTED SUMLEVELS: 
# 
# acs_ind:  0, 1, 2, all
# geo:  nation, state, msa, all


# msa is a supported geography:

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=msa&required=num_emp"
api %>% read.csv() %>% head()
#   year                                    acs_ind_name acs_ind       geo_name          geo num_emp
# 1 2014 Agriculture, Forestry, Fishing, Hunting, Mining      00 Albert Lea, MN 31000US10660     953
# 2 2014         Agriculture, Forestry, Fishing, Hunting    0000 Albert Lea, MN 31000US10660     930
# 3 2014          Mining, Quarrying, Oil, Gas Extraction    0001 Albert Lea, MN 31000US10660      23


# county is not a supported geography

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=county&required=num_emp"
api %>% read.csv() %>% head()
# does not work because county is not a supported sumlevel


# we can use any combination of the supported geo and acs_ind sumlevels:

api <- "http://api.datausa.io/api/csv/?show=geo,acs_ind&sumlevel=msa,0&required=num_emp"
api %>% read.csv() %>% head()
#   year                                    acs_ind_name acs_ind     geo_name          geo num_emp
# 1 2014 Agriculture, Forestry, Fishing, Hunting, Mining       0 Aberdeen, SD 31000US10100    1545
# 2 2014                                    Construction       1 Aberdeen, SD 31000US10100    1328
# 3 2014                                   Manufacturing       2 Aberdeen, SD 31000US10100    2887


usage.table <- getUsage( "geo", return.df=T )

Data compatability will vary with each geography depending upon the source dataset. For example, BLS data may only be available at the county level, not the census tract. This table presents a rough approximation of source data and compatible geography (rough draft...need to verify these).

DATASET NATION STATE MSA PUMA PLACE COUNTY TRACT
IPEDS X X X X X X .
ACS X X X X X X X
ACS PUMS X X . X . . .
BLS X X X X X X .
CPB X X . . . . .
CHR X X X X X X .

Geographic Aggregation

With census and economic data (show=geo & required=variable_of_interest) geography is implied. The other data types are aggregated by college major (show=cip), industry (show=naics), or occupational group (show=soc). If you want to calculate the statistics for these data by specific geographic regions you can do this in two ways.

The most straight-forward means is by using compound statement separated by commas. For example, show=naics,geo & level=0,state.

# Number of employees by top-level industry, by state:

api <- "http://api.datausa.io/api/csv/?show=geo,naics&sumlevel=state,0&required=emp"
api %>% read.csv() %>% head()
#   year geo_name       geo                                naics_name naics   emp
# 1 2013  Alabama 04000US01 Agriculture, Forestry, Fishing, & Hunting    11  6356
# 2 2013  Alabama 04000US01 Mining, Quarrying, & Oil & Gas Extraction    21  7942
# 3 2013  Alabama 04000US01                                 Utilities    22 17238
# 4 2013  Alabama 04000US01                              Construction    23 78318

For finer control over geography in addition to using the sumlevel = nation, state, msa, etc. statements above, you can also use GEO + US + FIPS codes to specify specific regions (the code prefix in parenthesis).

  • Nation (01000)
  • State (04000) [ list ]
  • County (05000) [ list ]
  • Public Use Microdata Area (79500) [ list ]
  • Metropolitan Statistical Area (31000) [ list ]
  • Census Designated Place (16000) [ list ]
  • Census Tracts (14000)

To create a code, combine the prefix, the letters "US", and the specific GEOID:

Area Type GEOID Structure Number of Digits Example Geographic Area Census GEOID Full API Code
State STATE 2 Texas 48 04000US48 (STATE CODE+"US"+TEXAS CODE)
MSA MSA ID 5 Atlanta, GA 12060 31000US12060 (MSA CODE+"US"+MSA ID)
County STATE+COUNTY 2+3=5 Harris County, TX 48201 05000US48201 (COUNTY CODE+"US"+HARRIS CODE)
Places STATE+PLACE 2+5=7 Houston, TX 4835000 16000US4835000 (PLACE CODE+"US"+HOUSTON CODE)
Tracts STATE+COUNTY+TRACTID 2+3+6=11 Census Tract 6001.02 in Akron, OH 39133600102 14000US39133600102

See below for instructions on specifying advanced queries for instructions on selecting sets of geographic units using the code structure. For example, since the county code has the structure STATE+COUNTY, you can select all counties within a state by selecting all instances that begin with the state FIP.

For example, Massachusettes has a state FIP of 25, so all counties in MA will look like "05000US25xxx". You can select all counties within MA using the where= statement and the starts-with operator ^:

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=county&required=pop&year=latest&where=geo:^05000US25"
api %>% read.csv() %>% head()
#   year              geo_name          geo    pop
# 1 2014 Barnstable County, MA 05000US25001 215167
# 2 2014  Berkshire County, MA 05000US25003 130064
# 3 2014    Bristol County, MA 05000US25005 551065

There is also a special case where you may want to search for all county-level or tract-level data within a specific geography such as an MSA or Census Place. You do this by specifying your data unit using "sumlevel" and also adding an additional geo= statement with the specific location.

For example, to return population data from all counties in Akron, OH (MSA Code=10420), you would specify the query this way:

http://api.datausa.io/api/?show=geo&sumlevel=county&required=pop&geo=31000US10420

This only works with COUNTIES inside MSAs, or TRACTS inside PLACES.

library( dplyr )

# COUNTIES INSIDE AKRON, OH
api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=county&required=income&year=latest&geo=31000US10420"
api %>% read.csv() %>% head()
#   year           geo_name          geo income
# 1 2014 Portage County, OH 05000US39133  51275
# 2 2014  Summit County, OH 05000US39153  50365

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=tract&required=income&year=latest&geo=31000US10420"
api %>% read.csv() %>% head()
# Doesn't work!!!

# INSTEAD QUERY BOTH COUNTIES SEPARATELY USING 'WHERE' FILTER

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=tract&required=income&year=latest&where=geo:^14000US39133"
api %>% read.csv() %>% head() # Portage County, OH
#   year             geo_name                geo income
# 1 2014 Census Tract 6001.02 14000US39133600102  60568
# 2 2014 Census Tract 6001.03 14000US39133600103  48229
# 3 2014    Census Tract 6002 14000US39133600200  58958

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=tract&required=income&year=latest&where=geo:^14000US39153"
api %>% read.csv() %>% head() # Summit County, OH
#   year             geo_name                geo income
# 1 2014    Census Tract 5011 14000US39153501100  28298
# 2 2014    Census Tract 5018 14000US39153501800  24837
# 3 2014    Census Tract 5019 14000US39153501900  10874


# COUNTIES INSIDE ALASKA

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=county&required=pop&year=latest&geo=04000US02"
api %>% read.csv() %>% head()
# this does not work with Alaska state ID 04000US02

# TRY WITH COUNTIES GEO USING WHERE FILTER

api <- "http://api.datausa.io/api/csv/?show=geo&sumlevel=county&required=pop&year=latest&where=geo:^05000US02"
api %>% read.csv() %>% head()
#   year                   geo_name          geo    pop
# 1 2014     Aleutians East Borough 05000US02013   3296
# 2 2014 Aleutians West Census Area 05000US02016   5650
# 3 2014 Anchorage Municipality, AK 05000US02020 298178

Returning Multiple Variables

To return several variables at once, add a comma between names in the variable definition slot:

http://api.datausa.io/api/csv/?show=cip&sumlevel=2&required=grads_men,grads_women

Filtering Results by Criteria

Sometimes you would like to filter not merely by equality but by other mechanisms. Using the where clause in an API call lets us express more complicated expressions. The basic syntax of using where is: where= column_name : condition

Let's say we wanted to filter the results to only show top-level industries containing more than an estimated 10M people in the workforce. We would write:

http://api.datausa.io/api/? show=naics&sumlevel=0 & where=num_ppl:>10000000

Here are some operators for filtering:

Description Operator Example Syntax
Greater than > &where=num_ppl:>10000000
Less than < &where=num_ppl:<10000000
String starts with ^ &where=naics:^23
String ends with $ (placed after text) &where=naics:3$
Not equal (integer) ! &where=avg_wage_rank:!1
Not equal (string) str! &where=geo:str!04000US25

Importing Data into R

To easily import results of a query into R, make sure to include the "csv" argument and read directly from the URL:

dat <- read.csv( "http://api.datausa.io/api/csv/?show=cip&sumlevel=2&required=grads_men,grads_women" )

# > head( dat )
#   year                         cip_name cip grads_men grads_women
# 1 2014 Philosophy and Religious Studies  38     11496        6930
# 2 2014                     Construction  46     30134        1462
# 3 2014          Mechanical Technologies  47    101496        4946
# 4 2014                        Education  13     70721      253623
# 5 2014                  Social Sciences  45     95246      104178
# 6 2014         Engineering Technologies  15     77850       11092

Quickly build a census dataset by adding multiple variables:

dat <- read.csv( "http://api.datausa.io/api/csv/?show=geo&sumlevel=msa&required=pop,income,mean_commute_minutes" )

# > head( dat )
#   year                                               geo_name          geo     pop income mean_commute_minutes
# 1 2014          Louisville/Jefferson County, KY-IN Metro Area 31000US31140 1253305  50386              22.8425
# 2 2013          Louisville/Jefferson County, KY-IN Metro Area 31000US31140 1244880  49753              22.7871
# 3 2014 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metro Area 31000US37980 6015336  62169              27.5973
# 4 2013 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metro Area 31000US37980 5992766  61923              27.3712
# 5 2014                 Seattle-Tacoma-Bellevue, WA Metro Area 31000US42660 3557037  68969              26.7180
# 6 2013                 Seattle-Tacoma-Bellevue, WA Metro Area 31000US42660 3504628  67745              26.3895

Useful Economic Queries

Age, Salary

http://api.datausa.io/api/?show=age&sumlevel=all&required=avg_wage

CIP Course, Wage

http://api.datausa.io/api/?show=cip&sumlevel=all&required=avg_wage

CIP Course, Age, Wage

http://api.datausa.io/api/?show=age,cip&sumlevel=all,all&required=avg_wage 

Course, Birthplace, Wage

http://api.datausa.io/api/?show=birthplace&sumlevel=all&required=avg_wage&cip=40 

Course, Degree

http://api.datausa.io/api/?show=degree&sumlevel=all&required=avg_wage,num_ppl&cip=40

Geography

http://api.datausa.io/api/?show=geo&sumlevel=state&required=avg_wage

Geography, Birthplace

http://api.datausa.io/api/?show=geo,birthplace&sumlevel=state,all&required=num_over5

Geography, Course

http://api.datausa.io/api/?show=cip&sumlevel=all&required=num_ppl&geo=04000US25

Geography, Degree

http://api.datausa.io/api/?show=geo&sumlevel=state&required=num_ppl&degree=19

Geography, Industry

http://api.datausa.io/api/?show=geo&sumlevel=state&required=num_ppl&naics=111

Geography, Industry, Occupation

http://api.datausa.io/api/?show=geo&sumlevel=state&required=num_ppl&naics=111&soc=110000

Geography, Occupation

http://api.datausa.io/api/?show=geo&sumlevel=state&required=num_ppl&soc=110000

Geography, Occupation, Race

http://api.datausa.io/api/?show=geo,race&sumlevel=state,all&required=num_ppl&soc=110000

Geography, Occupation, Sex

http://api.datausa.io/api/?show=geo,sex&sumlevel=state,all&required=num_ppl&soc=110000

Geography, Race

http://api.datausa.io/api/?show=geo,race&sumlevel=state,all&required=num_ppl

Geography, Sex

http://api.datausa.io/api/?show=geo,sex&sumlevel=state,all&required=num_ppl

Geography, Wage Bin

http://api.datausa.io/api/?show=geo,wage_bin&sumlevel=state,all&required=num_ppl

Industry

http://api.datausa.io/api/?show=naics&sumlevel=0&required=num_ppl

Industry, Course

http://api.datausa.io/api/?show=naics&sumlevel=0&required=num_ppl&cip=40

Industry, Degree

http://api.datausa.io/api/?show=naics&sumlevel=0&required=num_ppl&degree=19

Industry, Occupation

http://api.datausa.io/api/?show=naics&sumlevel=0&required=num_ppl&soc=110000

Industry, Occupation, Race

http://api.datausa.io/api/?show=naics&sumlevel=0&required=num_ppl&soc=110000&race=1

Industry, Occupation, Sex

http://api.datausa.io/api/?show=naics&sumlevel=0&required=num_ppl&soc=110000&sex=1

Industry, Race

http://api.datausa.io/api/?show=naics&sumlevel=0&required=num_ppl&race=1

Industry, Sex

http://api.datausa.io/api/?show=naics&sumlevel=0&required=num_ppl&sex=1

Industry, Wage Bin

http://api.datausa.io/api/?show=wage_bin,naics&sumlevel=all,0&year=latest

Occupation

http://api.datausa.io/api/?show=soc&sumlevel=0&year=latest

Occupation, Course

http://api.datausa.io/api/?show=soc&sumlevel=0&year=latest&cip=40

Occupation, Course, Degree

http://api.datausa.io/api/?show=soc,degree&sumlevel=0,all&year=latest&cip=40

Occupation, Degree

http://api.datausa.io/api/?show=soc,degree&sumlevel=0,all&year=latest

Occupation, Race

http://api.datausa.io/api/?show=soc,race&sumlevel=0,all&year=latest

Occupation, Sex

http://api.datausa.io/api/?show=soc&sumlevel=0&year=latest&sex=1

Occupation, Wage Bin

http://api.datausa.io/api/?show=soc,wage_bin&sumlevel=0,all&year=latest
@vikasyadav15
Copy link

Hi Lecy,
I am getting below error when i am trying get data using api. do we need some kind of authentication ??

java.io.IOException: Server returned HTTP response code: 403 for URL: http://api.datausa.io/api/csv?show=soc&sumlevel=3
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1840)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
at java.net.URL.openStream(URL.java:1038)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment