Skip to content

Instantly share code, notes, and snippets.

@kgjenkins
Last active March 30, 2016 19:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kgjenkins/f24c1852363ae1be9ee92a475dbc7250 to your computer and use it in GitHub Desktop.
Save kgjenkins/f24c1852363ae1be9ee92a475dbc7250 to your computer and use it in GitHub Desktop.
Calculate the county for Medicare Long Stay facilities

Calculate the county for Medicare Long Stay facilities

We want to calculate the county (name, FIPS) that corresponds to each row of this dataset: https://data.medicare.gov/Nursing-Home-Compare/Quality-Measures-Long-Stay/iqd3-nsf3

We will use CartoDB to add new columns that will contain the latitude, longitude, county name, and county FIPS code.

Step 1: Extract the point coordinates

The dataset already includes latitude/longitude coordinates, but these are jammed into the location column, appearing in parentheses after the full street address. In order to work with these coordinates as data, we will first need to extract these coordinates to separate columns. Newline characters are also present within the location values, which may cause problems for some software, so we'll want to clean that up as well. Here's an example (there are invisible linebreak characters after each line):

701 MONROE STREET NW
RUSSELLVILLE, AL 35653
(34.514971, -87.736372)
  • Load the csv into CartoDB and rename the table longstay

  • Run this SQL to clean up those newlines in the location column by changing them all to spaces. Run this SQL:

    UPDATE longstay
    SET location = regexp_replace(location, '\n', ' ', 'g')
    
  • Add new numeric columns _lat and _lon

  • Run this SQL to extract any existing lat/lon values to the new columns:

    UPDATE longstay
    
    SET
    _lat = regexp_replace(location, '.*\((-?\d+\.\d+), ?(-?\d+\.\d+)\).*', '\1')::numeric,
    _lon = regexp_replace(location, '.*\((-?\d+\.\d+), ?(-?\d+\.\d+)\).*', '\2')::numeric
    
    WHERE
    position('(' in location) > 0
    
  • Click the _lat column, select "Georeference..." and select the longitude and latitude columns. This turns each row into a point on the map.

Note that there were some location values in the data table that lack point coordinates. It appears that at least some of these are associated with faulty addresses (like "2018 SALISBURY MD 21802"). Depending on the goals of the project, these rows might need to be corrected, or they could just be discarded from the analysis. To view a list of the rows lacking point locations, run this SQL:

SELECT DISTINCT federal_provider_number, provider_name, location
FROM longstay
WHERE the_geom IS NULL

Step 2: Calculate the county for each point

Load the 2015 Census county boundaries into CartoDB: http://www2.census.gov/geo/tiger/TIGER2015/COUNTY/tl_2015_us_county.zip

(NOTE: The following method was also tested using the 500k generalized boundaries, which is a much smaller dataset, but the boundaries are not quite as precise. Comparing the results between both sets of boundaries, 8 of the locations that were very close to the county boundary were identified as different counties, so it is probably best to use the more precise boundaries.)

  • Run this SQL to calculate the county name and FIPS code for each point:
SELECT
l.*,
c.namelsad AS _countyname,
c.geoid AS _countyfips

FROM
longstay AS l,
tl_2015_us_county AS c

WHERE ST_Within(l.the_geom, c.the_geom)

Step 3: Export to CSV

To save a CSV file with the new columns, cilck the "Edit" link at the top right of the CartoDB window and select "Export layer". Save it as a CSV file. It may take a few moments before downloading.

When you open the .csv file, notice that the new columns (_lat, _lon, _countyname, _countyfips) are on the right side of the table. There are also two new columns (cartodb_id, the_geom) on the left side of the table -- these are specific to CartoDB and can be safely discarded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment