Skip to content

Instantly share code, notes, and snippets.

@slpsys
Created September 27, 2015 17:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save slpsys/fdea5354fec81b33fcd7 to your computer and use it in GitHub Desktop.
Save slpsys/fdea5354fec81b33fcd7 to your computer and use it in GitHub Desktop.
Tweet Harvest: Twitter User Data Dump-to-one-semi-clean-tweet-per-line file
require 'json'
fo = File.open('/tmp/tweetstorm.txt','w')
Dir['./data/js/tweets/*.js'].each do |file|
begin
fh = File.open(file)
data = fh.read
# My Twitter dump had a first line that assigns the JSON data to a variable,
# the rest was valid JSON. Example:
# Grailbird.data.tweets_2009_07 =
json_data = JSON.parse(data.split("\n")[1..-1].join)
tweetz = json_data.map {|t| t['text'].gsub(/\@\w+/, '')}
tweetz.each {|t| fo.puts(t) }
ensure
fh.close
end
end
fo.close
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment