Create a gist now

Instantly share code, notes, and snippets.

How to use Google's API with R

Using Google Maps API and R

This script uses RCurl and RJSONIO to download data from Google's API to get the latitude, longitude, location type, and formatted address

library(RCurl)
library(RJSONIO)
library(plyr)

Build a URL to access the API:

url <- function(address, return.call = "json", sensor = "false") {
  root <- "http://maps.google.com/maps/api/geocode/"
  u <- paste(root, return.call, "?address=", address, "&sensor=", sensor, sep = "")
  return(URLencode(u))
}

Function to parse the results:

geoCode <- function(address,verbose=FALSE) {
  if(verbose) cat(address,"\n")
  u <- url(address)
  doc <- getURL(u)
  x <- fromJSON(doc,simplify = FALSE)
  if(x$status=="OK") {
    lat <- x$results[[1]]$geometry$location$lat
    lng <- x$results[[1]]$geometry$location$lng
    location_type  <- x$results[[1]]$geometry$location_type
    formatted_address  <- x$results[[1]]$formatted_address
    return(c(lat, lng, location_type, formatted_address))
    Sys.sleep(0.5)
  } else {
    return(c(NA,NA,NA, NA))
  }
}

Test with one address

address <- geoCode("The White House, Washington, DC")

First two items are the latitude and longitude coordinates, then the location type and formatted address

address

We can use Plyr to geocode a vector with addresses

address <- c("The White House, Washington, DC","The Capitol, Washington, DC")
locations  <- ldply(address, function(x) geoCode(x))
names(locations)  <- c("lat","lon","location_type", "formatted")
head(locations)

The following are the different location types:

  • "ROOFTOP" indicates that the returned result is a precise geocode for which we have location information accurate down to street address precision.
  • RANGE_INTERPOLATED" indicates that the returned result reflects an approximation (usually on a road) interpolated between two precise points (such as intersections). Interpolated results are generally returned when rooftop geocodes are unavailable for a street address.
  • GEOMETRIC_CENTER" indicates that the returned result is the geometric center of a result such as a polyline (for example, a street) or polygon (region).
  • APPROXIMATE" indicates that the returned result is approximate.

For more info on Google Maps API check here

@llpuente
llpuente commented May 7, 2015

When I insert a vector of a couple hundred addresses, a non-trivial percentage comes back as NA's. This is presumably because when the API was pinged for those particular addresses, the status came back as something other than "OK". I think this because when those same entries are inputted individually, the coordinates come out neatly. Any thoughts on how to overcome this? One thought would be to "re-ping" the API each time the status is not "OK," but am wondering how to do so most efficiently. Thanks for any thoughts!

For what it's worth, here's a fairly hackish way of overcoming this issue:

Run loop to replace NA's created during the initial run of ldply:

for(i in 1:length(address)){
if(is.na(locations $lat[i])){
locations[i,]<-ldply(locations[i], function(x) geoCode(x))
}
}

@anniejw6

You're probably hitting the Google Maps API rate limit. Just add pauses in your loops. (Note, 3 seconds is arbitrary and probably a bit conservative.)

for(i in 1:length(address)){

  # Every nine records, pause 3 seconds
  if(i %% 9 == 0) Sys.sleep(3)

  geoCode(address[i])

}
@woofwoofwoofwoof

You should obtain an API Key from Google.

@jowen7448

I believe the rate you need is 0.1 as per google api terms and conditions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment