Created
March 27, 2011 15:40
-
-
Save dannguyen/889300 to your computer and use it in GitHub Desktop.
01-Studio-20 - lesson
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
##### | |
## The following lines load the files we need, and also | |
## makes sure everything is in place | |
['../results', '../html'].each do |dir| | |
raise "#{dir} does not exist as expected" if !File.exists?(dir) | |
end | |
require 'rubygems' | |
require 'rest-client' | |
require 'crack' | |
require 'pp' | |
##### | |
################################# | |
##### 00 | |
Kernel.puts("hello world") | |
################################## | |
##### 00.5 | |
##### Variables, strings and assignment | |
## The words on the left represent a variable. It's just an *arbitrary* label | |
## that we make up to point to some data that we want to reference later | |
## The data in this case is the String, "Hello World" | |
## The main, and essential difference, is that one has quotation marks around it, and the other doesn't | |
hello_world = "Hello world, in a variable" | |
## Also, note what the equals sign does here. It is assigning what's on the right to the variable | |
## on the left. It only goes one way. You can't, for instance, do this: | |
## "Hello world string " = hello_world | |
## Try changing the value of hello_world, and outputting it. See what happens when you don't | |
## enclose the String with quotation marks | |
#! Kernel.puts(hello_world) | |
## Main takeaway: Strings, these characters wrapped up in double quotes, are data | |
## Variables, and other words NOT in quotation marks, are how we tell Ruby what we're | |
## referencing. In terms of variables, Ruby doesn't care what we name them | |
## just as long as we're consistent in referring to them | |
=begin | |
################################# | |
##### 01 | |
## Files to load | |
test_filename = "http://code-dancow.s3.amazonaws.com/studio20/tweets-test/tweets-jayrosen_nyu-page_1.xml" | |
## There's actually 16 of these files. You can replace '1' with any of those numbers | |
## This command downloads the test_filename | |
test_file = RestClient.get(test_filename) | |
## lets print it out | |
#! Kernel.puts test_file | |
## Here's some explanation: "puts" is a method. So is "RestClient.get" | |
## More specifically, 'get' is a method that belongs to 'RestClient' | |
## The dot is what invokes 'get' | |
## | |
## Similarily, 'puts' is a method that belongs to the object 'Kernel' | |
## And 'puts' prints out to screen | |
## | |
## But to save us a little typing, you can just do puts "whatever_your_string is" | |
## Kernel is, for our purposes, a ubiquitous object. The interpreter will assume we | |
## are referring to it if we just through 'puts' out there on its own | |
## so: | |
#! puts( "Hello World") | |
## | |
## What did "RestClient.get" do? We saved it in a variable called test_file. Try doing puts on that | |
## | |
#! puts( test_file) | |
=end | |
=begin | |
################################# | |
##### 02 | |
##### Parsing the XML | |
## This command parses it into a usable data structure | |
parsed_test_file = Crack::XML.parse(test_file) | |
## pp is a command that, like puts, prints to screen | |
## but does so in a prettier format (it's short for 'pretty-print') | |
# | |
## let's see what test_file looks like | |
#! pp parsed_test_file | |
## Let's count how many statuses there are | |
#! puts parsed_test_file['statuses'].length | |
## Let's make a nicer output | |
## put it in a temporary | |
#! puts "There are " + parsed_test_file["statuses"].length.to_s + " statuses in the file: " + test_filename | |
## Try taking out the .to_s; what happens? | |
## So far, the main datatype that we've played around with are Strings | |
## Here are two new ones: Numbers and Arrays. We'll get to Arrays in the next section | |
## | |
## But numbers are as simple as you think. However, they are NOT set off by quotation marks | |
## Or any other kind of symbol. In fact, if you put them inside of quotation marks, they are | |
## nothing but strings. And you can't add Strings and Numbers together...and for the most part | |
## you can't add data of different types together without some kind of error or something unexpected | |
## happening... | |
=end | |
=begin | |
################################# | |
##### 03 | |
##### Looping through the file | |
## Make a variable to hold that long name | |
statuses = parsed_test_file['statuses'] | |
## As you might guess, 'statuses' essentially contains all the tweets from the page we downloaded | |
## 'statuses' is more than just a single object, it's a series/list of them. This is an Array | |
## Read more about Ruby Arrays: | |
## | |
## We can use the keys method to see what exists in each status | |
#! pp statuses.first | |
## You should get something like this | |
# {"coordinates"=>nil, | |
# "created_at"=>"Fri Mar 25 22:17:42 +0000 2011", | |
# "truncated"=>"false", | |
# "favorited"=>"false", | |
# "entities"=> | |
# {"urls"=> | |
# {"url"=> | |
# {"expanded_url"=>nil, | |
# "url"=>"http://bit.ly/gueC4z", | |
# "end"=>"94", | |
# "start"=>"74"}}, | |
# "hashtags"=>nil, | |
# "user_mentions"=>nil}, | |
# "text"=> | |
# "Help veterans get jobs. We already know they can handle workplace stress. http://bit.ly/gueC4z", | |
# "contributors"=>nil, | |
# "id"=>"51407434866098176", | |
# "retweet_count"=>"100+", | |
# "geo"=>nil, | |
# "retweeted"=>"false", | |
# "in_reply_to_user_id"=>nil, | |
# "source"=>"web", | |
# "in_reply_to_screen_name"=>nil, | |
# "user"=>{"id"=>"16303106"}, | |
# "place"=>nil, | |
# "in_reply_to_status_id"=>nil} | |
## Let's loop through each status. Using the bracket['string'] notation, we can | |
## pick out any attributes we want | |
statuses.each do |status| | |
#! puts( status['text']) | |
#! puts( status['retweet_count']) | |
end | |
=end | |
=begin | |
################################# | |
##### 04 | |
##### Writing to our hard disk | |
## So far, nothing we've done has been saved permanently | |
## Now we'll run through some commands to write files | |
output_xml_file = File.open('../results/some_tweets.xml', 'w') | |
## Remember that 'test_file' was the variable we used to store the saved file | |
output_xml_file.write(test_file) | |
## Let's open a different file | |
output_file = File.open('../results/parsed_tweets.html', 'w') | |
## We're going to write some HTML | |
## | |
output_file.puts('<html><head>') | |
output_file.puts('<link rel="stylesheet" href="../html/styles.css" type="text/css" media="screen" />') | |
output_file.puts('</head><body>') | |
#### Note how the File class has its own puts method | |
#### which sends to the string to the file | |
## Let's do that loop we did in Step 04 | |
statuses.each do |status| | |
output_file.puts('<div class="box">') | |
output_file.puts('<p>'+ status['text'] + '</p>') | |
output_file.puts('</div>') | |
end | |
output_file.puts('</body></html>') | |
=end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment