Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Created March 27, 2011 15:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dannguyen/889300 to your computer and use it in GitHub Desktop.
Save dannguyen/889300 to your computer and use it in GitHub Desktop.
01-Studio-20 - lesson
#####
## The following lines load the files we need, and also
## makes sure everything is in place
['../results', '../html'].each do |dir|
raise "#{dir} does not exist as expected" if !File.exists?(dir)
end
require 'rubygems'
require 'rest-client'
require 'crack'
require 'pp'
#####
#################################
##### 00
Kernel.puts("hello world")
##################################
##### 00.5
##### Variables, strings and assignment
## The words on the left represent a variable. It's just an *arbitrary* label
## that we make up to point to some data that we want to reference later
## The data in this case is the String, "Hello World"
## The main, and essential difference, is that one has quotation marks around it, and the other doesn't
hello_world = "Hello world, in a variable"
## Also, note what the equals sign does here. It is assigning what's on the right to the variable
## on the left. It only goes one way. You can't, for instance, do this:
## "Hello world string " = hello_world
## Try changing the value of hello_world, and outputting it. See what happens when you don't
## enclose the String with quotation marks
#! Kernel.puts(hello_world)
## Main takeaway: Strings, these characters wrapped up in double quotes, are data
## Variables, and other words NOT in quotation marks, are how we tell Ruby what we're
## referencing. In terms of variables, Ruby doesn't care what we name them
## just as long as we're consistent in referring to them
=begin
#################################
##### 01
## Files to load
test_filename = "http://code-dancow.s3.amazonaws.com/studio20/tweets-test/tweets-jayrosen_nyu-page_1.xml"
## There's actually 16 of these files. You can replace '1' with any of those numbers
## This command downloads the test_filename
test_file = RestClient.get(test_filename)
## lets print it out
#! Kernel.puts test_file
## Here's some explanation: "puts" is a method. So is "RestClient.get"
## More specifically, 'get' is a method that belongs to 'RestClient'
## The dot is what invokes 'get'
##
## Similarily, 'puts' is a method that belongs to the object 'Kernel'
## And 'puts' prints out to screen
##
## But to save us a little typing, you can just do puts "whatever_your_string is"
## Kernel is, for our purposes, a ubiquitous object. The interpreter will assume we
## are referring to it if we just through 'puts' out there on its own
## so:
#! puts( "Hello World")
##
## What did "RestClient.get" do? We saved it in a variable called test_file. Try doing puts on that
##
#! puts( test_file)
=end
=begin
#################################
##### 02
##### Parsing the XML
## This command parses it into a usable data structure
parsed_test_file = Crack::XML.parse(test_file)
## pp is a command that, like puts, prints to screen
## but does so in a prettier format (it's short for 'pretty-print')
#
## let's see what test_file looks like
#! pp parsed_test_file
## Let's count how many statuses there are
#! puts parsed_test_file['statuses'].length
## Let's make a nicer output
## put it in a temporary
#! puts "There are " + parsed_test_file["statuses"].length.to_s + " statuses in the file: " + test_filename
## Try taking out the .to_s; what happens?
## So far, the main datatype that we've played around with are Strings
## Here are two new ones: Numbers and Arrays. We'll get to Arrays in the next section
##
## But numbers are as simple as you think. However, they are NOT set off by quotation marks
## Or any other kind of symbol. In fact, if you put them inside of quotation marks, they are
## nothing but strings. And you can't add Strings and Numbers together...and for the most part
## you can't add data of different types together without some kind of error or something unexpected
## happening...
=end
=begin
#################################
##### 03
##### Looping through the file
## Make a variable to hold that long name
statuses = parsed_test_file['statuses']
## As you might guess, 'statuses' essentially contains all the tweets from the page we downloaded
## 'statuses' is more than just a single object, it's a series/list of them. This is an Array
## Read more about Ruby Arrays:
##
## We can use the keys method to see what exists in each status
#! pp statuses.first
## You should get something like this
# {"coordinates"=>nil,
# "created_at"=>"Fri Mar 25 22:17:42 +0000 2011",
# "truncated"=>"false",
# "favorited"=>"false",
# "entities"=>
# {"urls"=>
# {"url"=>
# {"expanded_url"=>nil,
# "url"=>"http://bit.ly/gueC4z",
# "end"=>"94",
# "start"=>"74"}},
# "hashtags"=>nil,
# "user_mentions"=>nil},
# "text"=>
# "Help veterans get jobs. We already know they can handle workplace stress. http://bit.ly/gueC4z",
# "contributors"=>nil,
# "id"=>"51407434866098176",
# "retweet_count"=>"100+",
# "geo"=>nil,
# "retweeted"=>"false",
# "in_reply_to_user_id"=>nil,
# "source"=>"web",
# "in_reply_to_screen_name"=>nil,
# "user"=>{"id"=>"16303106"},
# "place"=>nil,
# "in_reply_to_status_id"=>nil}
## Let's loop through each status. Using the bracket['string'] notation, we can
## pick out any attributes we want
statuses.each do |status|
#! puts( status['text'])
#! puts( status['retweet_count'])
end
=end
=begin
#################################
##### 04
##### Writing to our hard disk
## So far, nothing we've done has been saved permanently
## Now we'll run through some commands to write files
output_xml_file = File.open('../results/some_tweets.xml', 'w')
## Remember that 'test_file' was the variable we used to store the saved file
output_xml_file.write(test_file)
## Let's open a different file
output_file = File.open('../results/parsed_tweets.html', 'w')
## We're going to write some HTML
##
output_file.puts('<html><head>')
output_file.puts('<link rel="stylesheet" href="../html/styles.css" type="text/css" media="screen" />')
output_file.puts('</head><body>')
#### Note how the File class has its own puts method
#### which sends to the string to the file
## Let's do that loop we did in Step 04
statuses.each do |status|
output_file.puts('<div class="box">')
output_file.puts('<p>'+ status['text'] + '</p>')
output_file.puts('</div>')
end
output_file.puts('</body></html>')
=end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment