Skip to content

Instantly share code, notes, and snippets.

@s2t2
Last active September 19, 2015 04:01
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save s2t2/0d38648e5f23ab839d05 to your computer and use it in GitHub Desktop.
Save s2t2/0d38648e5f23ab839d05 to your computer and use it in GitHub Desktop.
extract data from google drive spreadsheet (ruby). this has been made into a gem -- https://github.com/data-creative/google-sheet-reader-ruby
*.p12
*.json

Google Driver

An example of how to extract spreadsheet data from google drive.

Prerequisites

Set up a new application at https://console.developers.google.com/. Note its name.

Under APIs & auth > APIs, enable the Drive API.

Under APIs & auth > Credentials, add credentials for a service account. Download a .p12 file, note its name, and store it in the root directory of this repository. Note the service account's email address.

Open your Google Sheet. Note its id by inspecting its url.

docs.google.com/spreadsheets/d/id/edit.

Share the Google Sheet with the service account's email address in a "can edit" role.

Usage

Clone the repo.

git clone git@gist.github.com:0d38648e5f23ab839d05.git
cd 0d38648e5f23ab839d05

Install gem dependencies.

bundle install

Edit the configuration methods at the top of google_driver.rb, specifying your api keys, file id, and a custom code snippet to execute for each row.

Run the extraction script.

ruby google_driver.rb
source 'https://rubygems.org'
gem 'google-api-client'
gem 'pry' # for debugging; add `binding.pry` to open an interactive console
GEM
remote: https://rubygems.org/
specs:
activesupport (4.2.4)
i18n (~> 0.7)
json (~> 1.7, >= 1.7.7)
minitest (~> 5.1)
thread_safe (~> 0.3, >= 0.3.4)
tzinfo (~> 1.1)
addressable (2.3.8)
autoparse (0.3.3)
addressable (>= 2.3.1)
extlib (>= 0.9.15)
multi_json (>= 1.0.0)
coderay (1.1.0)
extlib (0.9.16)
faraday (0.9.1)
multipart-post (>= 1.2, < 3)
google-api-client (0.8.6)
activesupport (>= 3.2)
addressable (~> 2.3)
autoparse (~> 0.3)
extlib (~> 0.9)
faraday (~> 0.9)
googleauth (~> 0.3)
launchy (~> 2.4)
multi_json (~> 1.10)
retriable (~> 1.4)
signet (~> 0.6)
googleauth (0.4.2)
faraday (~> 0.9)
jwt (~> 1.4)
logging (~> 2.0)
memoist (~> 0.12)
multi_json (~> 1.11)
signet (~> 0.6)
i18n (0.7.0)
json (1.8.3)
jwt (1.5.1)
launchy (2.4.3)
addressable (~> 2.3)
little-plugger (1.1.3)
logging (2.0.0)
little-plugger (~> 1.1)
multi_json (~> 1.10)
memoist (0.12.0)
method_source (0.8.2)
minitest (5.8.0)
multi_json (1.11.2)
multipart-post (2.0.0)
pry (0.10.1)
coderay (~> 1.1.0)
method_source (~> 0.8.1)
slop (~> 3.4)
retriable (1.4.1)
signet (0.6.1)
addressable (~> 2.3)
extlib (~> 0.9)
faraday (~> 0.9)
jwt (~> 1.5)
multi_json (~> 1.10)
slop (3.6.0)
thread_safe (0.3.5)
tzinfo (1.2.2)
thread_safe (~> 0.1)
PLATFORMS
ruby
DEPENDENCIES
google-api-client
pry
BUNDLED WITH
1.10.5
#
# EDIT EACH METHOD IN THE SECTION BELOW...
#
# Google Api info
def service_account_app_name
ENV["GOOGLE_DRIVER_APP_NAME"] || "my-app"
end
def service_account_email_address
ENV["GOOGLE_DRIVER_EMAIL"] || "long_email@developer.gserviceaccount.com"
end
def service_account_key_file_name
ENV["GOOGLE_DRIVER_KEY_FILE_NAME"] || "abcxyz.privatekey.json"
end
# Spreadsheet info
def file_id
ENV["GOOGLE_DRIVER_FILE_ID"] || "ABC123-DEF456-GHI789"
end
# Custom procedure to perform with each row in the spreadsheet.
def transform(row)
puts "EDIT ME -- PARSING A ROW HERE -- #{row.inspect}"
end
#
# ... STOP EDITING
#
require 'csv'
require 'google/api_client'
require 'pry'
def client_info
{
:application_name => service_account_app_name,
:application_version => "1.0"
}
end
def path_to_api_key_file
"#{Dir.pwd}/#{service_account_key_file_name}"
end
def self.client
key = Google::APIClient::PKCS12.load_key(path_to_api_key_file, 'notasecret')
asserter = Google::APIClient::JWTAsserter.new(service_account_email_address, 'https://www.googleapis.com/auth/drive', key)
google_drive_client = Google::APIClient.new(client_info)
google_drive_client.authorization = asserter.authorize()
return google_drive_client
end
def self.drive
client.discovered_api('drive', 'v2')
end
def self.extract
puts "CONNECTING TO GOOGLE DRIVE ..."
get_request_result = client.execute(
:api_method => drive.files.get,
:parameters => { 'fileId' => file_id }
)
raise FileRetrievalError unless get_request_result.status == 200
file = get_request_result.data
download_url = file.exportLinks["text/csv"]
raise FileDownloadLinkError unless download_url
download_request_result = client.execute(:uri => download_url)
raise FileDownloadError unless download_request_result.status == 200
file_contents = download_request_result.body
csv_result = CSV.parse(file_contents, headers:true)
#headers = csv_result.headers
csv_result.each do |row|
transform(row)
end
puts "PARSED #{csv_result.count} ROWS"
end
class FileRetrievalError < StandardError ; end
class FileDownloadLinkError < StandardError ; end
class FileDownloadError < StandardError ; end
extract
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment