Skip to content

Instantly share code, notes, and snippets.

@kenbod
Last active August 29, 2015 14:16
Show Gist options
  • Save kenbod/87f7f4d521c7eb7c52bf to your computer and use it in GitHub Desktop.
Save kenbod/87f7f4d521c7eb7c52bf to your computer and use it in GitHub Desktop.
Ruby Program to Import Tweets into CouchDB

import_tweets.rb

This program shows how to import a set of tweets into a CouchDB database. It is hard coded with knowledge of the particular file that needs to be imported but those aspects of the program can easily be generalized. I make use of Typhoeus to make the actual GET and PUT requests that are needed.

The program also shows how to implement a command-line "spinner" where the program updates just a single line of text while importing the tweets to show progress.

source 'https://rubygems.org'
gem 'typhoeus'
require 'bundler/setup'
require 'json'
require 'time'
require 'typhoeus'
$num_of_tweets = 100681
$base_url = "http://127.0.0.1:5984/tweets/"
$uuid_url = "http://127.0.0.1:5984/_uuids"
def report_progress(current, previous_length)
print "\b" * previous_length
percent = (current.to_f / $num_of_tweets) * 100
template = "Importing Tweet %d of %d (%.2f %%)."
message = template % [current, $num_of_tweets, percent]
print message
message.length
end
def get_uuid_for_tweet
options = {}
options[:method] = :get
request = Typhoeus::Request.new($uuid_url, options)
response = request.run
if response.code == 200
result = JSON.parse(response.body)
return result["uuids"][0]
else
puts "Error when trying to get a UUID"
exit 1
end
end
def insert_tweet(uuid, tweet)
options = {}
options[:method] = :put
options[:body] = tweet
url = $base_url + uuid
request = Typhoeus::Request.new(url, options)
response = request.run
if response.code != 201
puts "FAILURE : #{Time.now}"
puts "Response Code: #{response.code}"
puts "Response Info: #{response.status_message}"
exit 1
end
end
if __FILE__ == $0
index = 1
previous_length = 0
input_file = "big_data_tweets.json"
IO.foreach(input_file) do |line|
tweet = line.chomp
uuid = get_uuid_for_tweet
insert_tweet(uuid, tweet)
previous_length = report_progress(index, previous_length)
index += 1
end
puts
puts "Done!"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment