SantoshSrinivas79/Visualizing state date buried in a PDF.md

## Visualizing state date buried in a PDF.md

      
    Raw
  

              Visualizing state date buried in a PDF.md
            
          
    Visualizing state date buried in a PDF

I recently came across a common problem of visualizing simple state level data captured in PDFs as a choropleth. The data is from an significantly well researched report on housing data
tabulapdf/tabula works quite well in extracting data. Yeah! Even on a windows machine!
Now, that we have got the data, let us create the state choropleth.

Robinlovelace/Creating-maps-in-R

Making basic state level Choropleths is a breeze CRAN - Package choroplethr available at arilamstein/choroplethr
Let us create a static version of the map like it is available at Out Of Reach: National Low Income Housing Coalition.
sDir <- "~/Dropbox/pandora/My-Projects/repos/hackery/"
setwd(sDir)

library(choroplethr)
library(choroplethrMaps)
data(state.regions)
head(state.regions)

Now, the data in state.regions does not match exactly with the dataset we have at hand.
So, instead of correcting the data so that it matches manually, let us try to use a algorithmic approach.
The R packages that seem to be available to accomplish this task are:

markvanderloo/stringdist. Well explained at Approximate text matching with the stringdist package
R: String Metrics

Replacing the data worked quite well with gsub explained at http://biostat.mc.vanderbilt.edu/wiki/pub/Main/SvetlanaEdenRFiles/regExprTalk.pdf
To Do


Look into more coloring here ggplot2: axis manipulation and themes
Choropleth in R: custom breaks and plotting - Geographic Information Systems Stack Exchange

Sources


Report on Housing Data: nlihc.org/sites/default/files/oor/OOR_2015_FULL.pdf
How to extract data from a PDF - #Interhacktives
pdftables – a Python library for getting tables out of PDF files | ScraperWiki
screen scraping - Extracting tables from PDF files programmatically? - Stack Overflow
tabulapdf/tabula