Skip to content

Instantly share code, notes, and snippets.

@patriques82
Last active July 24, 2019 15:53
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save patriques82/6720699 to your computer and use it in GitHub Desktop.
Save patriques82/6720699 to your computer and use it in GitHub Desktop.
A summary of the book "The Ruby Programming Language" by David Flanagana and Yukihiro "Matz" Matsumoto
CONTENT
Expressions, Statements and Controlstructures
Equlaity
Assigments
The ||= idiom
Other assignments
Flip-flops
Iterators
Blocks
Control-flow keywords
Return
Break, Next and Redo
Throw and Catch
Raise and rescue
Methods, Procs, Lambdas and Closures
Methods
Arguments
Return values
Method Lookup
Procs and Lambdas
Closures
Method objects
Classes and Modules
Method visibility and Inheritance
Arrayifing, Hash access, and Equlity (Ducktyping)
Structs
Class methods
Clone and Dup
Modules
Namespaces
Mixins
Load and Require
Loadpaths
Eigenclass
Other useful stuff
Threads
Tracing
Eval
Monkey patching
DSL´s
Ruby I/O
Networking
Expressions and Statements
Many programming languages distinguish between low-level expressions and higherlevel statements,
such as conditionals and loops. In these languages, statements control the flow of a program, but
they do not have values. They are executed, rather than evaluated. In Ruby, there is no clear
distinction between statements and expressions; everything in Ruby, including class and method
definitions, can be evaluated as an expression and will return a value. The fact that if
statements return a value means that, for example, the multiway conditional shown previously can
be elegantly rewritten as follows:
name = if x == 1 then "one"
elsif x == 2 then "two"
elsif x == 3 then "three"
elsif x == 4 then "four"
else "many"
end
Instead of writing:
if expression then code end
we can simply write:
code if expression
When used in this form, if is known as a statement (or expression) modifier. To use if as a
modifier, it must follow the modified statement or expression immediately, with no intervening
line break.
y = x.invert if x.respond_to? :invert
y = (x.invert if x.respond_to? :invert)
If x does not have a method named invert, then nothing happens at all, and the value of y is not
modified. In the second line, the if modifier applies only to the method call. If x does not have
an invert method, then the modified expression evaluates to nil, and this is the value that is
assigned to y. Note that an expression modified with an if clause is itself an expression that can
be modified. It is therefore possible to attach multiple if modifiers to an expression:
# Output message if message exists and the output method is defined
puts message if message if defined? puts
This should be avoided for clarity:
puts message if message and defined? puts
Equality
The equal? method is defined by Object to test whether two values refer to exactly the same
object. For any two distinct objects, this method always returns false:
a = "Ruby" # One reference to one String object
b = c = "Ruby" # Two references to another String object
a.equal?(b) # false: a and b are different objects
b.equal?(c) # true: b and c refer to the same object
By convention, subclasses never override the equal? method. The == operator is the most common
way to test for equality. In the Object class, it is simply a synonym for equal?, and it tests
whether two object references are identical. Most classes redefine this operator to allow distinct
instances to be tested for equality.
a = "Ruby" # One String object
b = "Ruby" # A different String object with the same content
a.equal?(b) # false: a and b do not refer to the same object
a == b # true: but these two distinct objects have equal values
Assignments
The ||= idiom
You might use this line:
results ||= []
Think about this for a moment. It expands to:
results = results || []
The righthand side of this assignment evaluates to the value of results, unless that is nil or
false. In that case, it evaluates to a new, empty array. This means that the abbreviated
assignment shown here leaves results unchanged, unless it is nil or false, in which case it
assigns a new array.
Other assignments
x = 1, 2, 3 # x = [1,2,3]
x, = 1, 2, 3 # x = 1; other values are discarded
x, y, z = [1, 2, 3] # Same as x,y,z = 1,2,3
x, y, z = 1, 2 # x=1; y=2; z=nil
x, y, z = 1, *[2,3] # Same as x,y,z = 1,2,3
x,*y = 1, 2, 3 # x=1; y=[2,3]
x = y = z = 0 # Assign zero to variables x, y, and z
x,(y,z) = a, b
This is effectively two assignments executed at the same time:
x = a
y,z = b
To make it clearer
x,y,z = 1,[2,3] # No parens: x=1;y=[2,3];z=nil
x,(y,z) = 1,[2,3] # Parens: x=1;y=2;z=3
Case is a alternative to if/elsif/esle. The last expression evaluated in the case expression
becomes the return value of the case statement. === is the case equality operator. For many
classes, such as the Fixnum class used earlier, the === operator behaves just the same as ==. But
certain classes define this operator in interesting ways.Here is a example of a using a range in
a case:
# Compute 2006 U.S. income tax using case and Range objects
tax = case income
when 0..7550
income * 0.1
when 7550..30650
755 + (income-7550)*0.15
when 30650..74200
4220 + (income-30655)*0.25
when 74200..154800
15107.5 + (income-74201)*0.28
when 154800..336550
37675.5 + (income-154800)*0.33
else
97653 + (income-336550)*0.35
end
Flip-flops
When the .. and ... operators are used in a conditional, such as an if statement, or in a loop,
such as a while loop, they do not create Range objects. Instead, they create a special kind of
Boolean expression called a flip-flop. A flip-flop expression evaluates to true or false, just as
comparison and equality expressions do. Consider the flip-flop in the following code. Note that
the first .. in the code creates a Range object. The second one creates the flip-flop
expression:
(1..10).each {|x| print x if x==3..x==5 }
The flip-flop consists of two Boolean expressions joined with the .. operator, in the context of a
conditional or loop. A flip-flop expression is false unless and until the lefthand expression
evaluates to true. Once that expression has become true, the expression “flips” into a persistent
true state. The following simple Ruby program demonstrates a flip-flop. It reads a text file
line-by-line and prints any line that contains the text “TODO”. It then continues printing lines
until it reads a blank line:
ARGF.each do |line| # For each line of standard in or of named files
print line if line=~/TODO/..line=~/^$/ # Print lines when flip-flop is true
end
Iterators
The defining feature of an iterator method is that it invokes a block of code associated with the
method invocation. You do this with the yield statement. The following method is a trivial
iterator that just invokes its block twice
def twice
yield
yield
end
Other examples:
squares = [1,2,3].collect {|x| x*x} # => [1,4,9]
evens = (1..10).select {|x| x%2 == 0} # => [2,4,6,8,10]
odds = (1..10).reject {|x| x%2 == 0} # => [1,3,5,7,9]
The inject method is a little more complicated than the others. It invokes the associated block
with two arguments. The first argument is an accumulated value of some sort from previous iterations.
data = [2, 5, 3, 4]
sum = data.inject {|sum, x| sum + x } # => 14 (2+5+3+4)
The initial value of the accumulator variable is either the argument to inject, if there is one,
or the first element of the enumerable object, as the two examples below shows
floatprod = data.inject(1.0) {|p,x| p*x } # => 120.0 (1.0*2*5*3*4)
max = data.inject {|m,x| m>x ? m : x } # => 5 (largest element)
If a method is invoked without a block, it is an error for that method to yield, because there is
nothing to yield to. Sometimes you want to write a method that yields to a block if one is
provided but takes some default action (other than raising an error) if invoked with no block. To
do this, use block_given? to determine whether there is a block associated with the invocation.
Example:
def sequence(n, m, c)
i, s = 0, [] # Initialize variables
while(i < n) # Loop n times
y = m*i + c # Compute value
yield y if block_given? # Yield, if block
s << y # Store the value
i += 1
end
s # Return the array of values
end
Normally, enumerators with next methods are created from Enumerable objects that have an each
method. If, for some reason, you define a class that provides a next method for external
iteration instead of an each method for internal iteration, you can easily implement each in
terms of next. In fact, turning an externally iterable class that implements next into an
Enumerable class is as simple as mixing in a module.
module Iterable
include Enumerable # Define iterators on top of each
def each # And define each on top of next
loop { yield self.next }
end
end
The “gang of four” define and contrast internal and external iterators quite clearly in their
design patterns book:
"A fundamental issue is deciding which party controls the iteration, the iterator or the client
that uses the iterator. When the client controls the iteration, the iterator is called an external
iterator, and when the iterator controls it, the iterator is an internal iterator. Clients that use
an external iterator must advance the traversal and request the next element explicitly from the
iterator. In contrast, the client hands an internal iterator an operation to perform, and the
iterator applies that operation to every element...."
In Ruby, iterator methods like each are internal iterators; they control the iteration and “push”
values to the block of code associated with the method invocation. Enumerators have an each method
for internal iteration, but in Ruby 1.9 and later, they also work as external iterators—client code
can sequentially “pull” values from an enumerator with next.
Suppose you have two Enumerable collections and need to iterate their elements in pairs: the first
elements of each collection, then the second elements, and so on. Without an external iterator, you
must convert one of the collections to an array (with the to_a method defined by Enumerable ) so
that you can access its elements while iterating the other collection with each. Below shows three
different methods to iterate through such collections in parallell:
# Call the each method of each collection in turn.
# This is not a parallel iteration and does not require enumerators.
def sequence(*enumerables, &block)
enumerables.each do |enumerable|
enumerable.each(&block)
end
end
# Iterate the specified collections, interleaving their elements.
# This can't be done efficiently without external iterators.
# Note the use of the uncommon else clause in begin/rescue.
def interleave(*enumerables)
# Convert to an array of enumerators
enumerators = enumerables.map {|e| e.to_enum }
# Loop until we don't have any enumerators
until enumerators.empty?
begin
# Take the first enumerator
e = enumerators.shift
yield e.next # Get its next and pass to the bloc
rescue StopIteration # If no exception occurred
else
enumerators << e # Put the enumerator back
end
end
end
# Iterate the specified collections, yielding
# tuples of values, one value from each of the
# collections. See also Enumerable.zip.
def bundle(*enumerables)
enumerators = enumerables.map {|e| e.to_enum }
loop { yield enumerators.map {|e| e.next} }
end
# Examples of how these iterator methods work
a,b,c = [1,2,3], 4..6, 'a'..'e'
sequence(a,b,c) {|x| print x} # prints "123456abcde"
interleave(a,b,c) {|x| print x} # prints "14a25b36cde"
bundle(a,b,c) {|x| print x} # '[1, 4, "a"][2, 5, "b"][3, 6, "c"]'
In general, Ruby’s core collection of classes iterate over live objects rather than private copies
or “snapshots” of those objects, and they make no attempt to detect or prevent concurrent
modification to the collection while it is being iterated.
a = [1,2,3,4,5] # prints "1,1\n3,2\n5,3"
a.each {|x| puts "#{x},#{a.shift}" } '
Blocks
Blocks may not stand alone; they are only legal following a method invocation. You can, however,
place a block after any method invocation; if the method is not an iterator and never invokes the
block with yield, the block will be silently ignored. Blocks are delimited with curly braces or
with do and end keywords.
Consider the Array.sort method. If you associate a block with an invocation of this method, it will
yield pairs of elements to the block, and it is the block’s job to sort them. The block’s return
value (–1, 0, or 1) indicates the ordering of the two arguments. The “return value” of the block is
available to the iterator method as the value of the yield statement.
# The block takes two words and "returns" their relative order.
words.sort! {|x,y| y <=> x }
Blocks define a new variable scope: variables created within a block exist only within that block
and are undefined outside of the block. Be cautious, however; the local variables in a method are
available to any blocks within that method. Ruby 1.9 is different: block parameters are always local
to their block, and invocations of the block never assign values to existing variables.Ruby 1.9 is
different in another important way, too. Block syntax has been extended to allow you to declare
block-local variables that are guaranteed to be local, even if a variable by the same name already
exists in the enclosing scope. To do this, follow the list of block parameters with a semicolon and
a comma-separated list of block local variables. Here is an example:
# local variables
x = y = 0 # x and y are local to block
1.upto(4) do |x;y| # x and y "shadow" the outer variables
y = x + 1 # Use y as a scratch var
puts y*y # Prints 4, 9, 16, 25
end [x,y] # => [0,0]: block does not alter these
In this code, x is a block parameter: it gets a value when the block is invoked with yield. y is a
block-local variable. It does not receive any value from a yield invocation, but it has the value
nil until the block actually assigns some other value to it.Blocks can have more than one parameter
and more than one local variable, of course. Here is a block with two parameters and three local
variables:
hash.each {|key,value; i,j,k| ... }
In Ruby 1.8, only the last block parameter may have an * prefix. Ruby 1.9 lifts this restriction and
allows any one block parameter, regardless of its position in the list, to have an * prefix:
def five; yield 1,2,3,4,5; end # Yield 5 values
# Extra values go into body array
five do |head, *body, tail|
print head, body, tail # Prints "1[2,3,4]5"
end
Control-flow keywords
Return
return may optionally be followed by an expression, or a comma-separated list of expressions. If
there is no expression, then the return value of the method is nil. If there is one expression,
then the value of that expression becomes the return value of the method. If there is more than one
expression after the return keyword, then the return value of the method is an array containing the
values of those expressions.
Most Ruby programmers omit return when it is not necessary. Instead of writing return x as the last
line of a method, they would simply write x. The return value in this case is the value of the last
expression in the method. return is useful if you want to return from a method prematurely, or if
you want to return more than one value.
def double(x)
return x, x.dup
end
When the return statement is used in a block, it does not just cause the block to return. And it
does not just cause the iterator that invokes the block to return. return always causes the
enclosing method to return, just like it is supposed to, since a block is not a method.
def find(array, target)
array.each_with_index do |element,index| # return element from find, not from block
return index if (element == target)
end
nil # If we didn't find the element
end
Break, Next and Redo
Like return keyword, break and next (continue in java) can be used alone or together with
expressions, or comma-separated expressions. We have seen already what return does in a block,
when next or break is used together with values in a block the values are what is "yielded".
squareroots = data.collect do |x|
next 0 if x < 0 # 0 for negative values
Math.sqrt(x)
end
As with the return statement, it is not often necessary to explicitly use next to specify a value.
squareroots = data.collect do |x|
if (x < 0)
then 0
else
Math.sqrt(x)
end
end
The redo statement restarts the current iteration of a loop or iterator. This is not the same
thing as next. next transfers control to the end of a loop or block so that the next iteration
can begin, whereas redo transfers control back to the top of the loop or block so that the
iteration can start over.
i = 0
while(i < 3) # Prints "0123" instead of "012"
print i # Control returns here when redo is executed
i += 1
redo if i == 3
end
One use, however, is to recover from input errors when prompting a user for input.
puts "Please enter the first word you think of"
words = %w(apple banana cherry)
response = words.collect do |word| # Control returns here when redo is executed
print word + "> " # Prompt the user
response = gets.chop # Get a response
if response.size == 0
word.upcase! # Emphasize the prompt
redo # And skip to the top of the block
end
response # Return the response
end
The retry statement is normally used in a rescue clause to re-execute a block of code that raised
an exception.
Throw and catch
throw and catch are Kernel methods that define a control structure that can be thought of as a
multilevel break. throw doesn’t just break out of the current loop or block but can actually
transfer out any number of levels, causing the block defined with a catch to exit. If you are
familiar with languages like Java and JavaScript, then you probably recognize throw and catch as
the keywords those languages use for raising and handling exceptions.
Ruby does exceptions differently, using raise and rescue, which we’ll learn about later. But the
parallel to exceptions is intentional. Calling throw is very much like raising an exception. And
the way a throw propagates out through the lexical scope and then up the call stack is very much
the same as the way an exception propagates out and up. Despite the similarity to exceptions, it
is best to consider throw and catch as a general-purpose (if perhaps infrequently used) control
structure rather than an exception mechanism. Here is an example:
for matrix in data do # Process a deeply nested data structure.
catch :missing_data do # Label this statement so we can break out.
for row in matrix do
for value in row do
throw :missing_data unless value # Break out of two loops at once.
# Otherwise, do some actual data processing here.
end
end
end
# We end up here after the nested loops finish processing each matrix.
# We also get here if :missing_data is thrown.
end
If no catch call matches the symbol passed to throw, then a NameError exception is raised.
Raise and rescue
An exception is an object that represents some kind of exceptional condition; it indicates that
something has gone wrong. Raising an exception transfers the flow-of control to exception
handling code.The Exception class defines two methods that return details about the exception. The
message method returns a string that may provide human-readable details about what went wrong.
The other important method of exception objects is backtrace. This method returns an array of
strings that represents the call stack at the point that the exception was raised. Each element of
the array is a string of the form:
filename : linenumber in methodname
If you are defining a module of Ruby code, it is often appropriate to define your own subclass of
StandardError for exceptions that are specific to your module. This may be a trivial, one-line
subclass:
class MyError < StandardError; end
def factorial(n) # Define a factorial method with argument n
raise MyError, "bad argument" if n < 1 # Raise an exception for bad n
return 1 if n == 1 # factorial(1) is 1
n * factorial(n-1) # Compute other factorials recursively
end
Whithout defining the class ruby raises a runtime error by default.
raise "An default runtime error"
Most commonly, a rescue clause is attached to a begin statement. The begin statement exists simply
to delimit the block of code within which exceptions are to be handled. A begin statement with a
rescue clause looks like this:
begin
# Any number of Ruby statements go here.
# Usually, they are executed without exceptions and
# execution continues after the end statement.
rescue
# This is the rescue clause; exception-handling code goes here.
# If an exception is raised by the code above, or propagates up
# from one of the methods called above, then execution jumps here.
end
An example using this:
begin # Handle exceptions in this block
x = factorial(-1) # Note illegal argument
rescue => ex # Store exception in variable ex
puts "#{ex.class}: #{ex.message}" # Handle exception by printing message
end # End the begin/rescue block
if you want to handle only specific types of exceptions, you must include one or more exception
classes in the rescue clause.
rescue ArgumentError => ex
# or to handle more errors
rescue ArgumentError, TypeError => error
If you want to handle each error individually you could:
begin
x = factorial(1)
rescue ArgumentError => ex
puts "Try again with a value >= 1"
rescue TypeError => ex
puts "Try again with an integer"
rescue Exception => ex
puts "No idea what happened" # Use rescue Exception as the last rescue clause.
end
A begin statement may include an else clause after its rescue clauses. You might guess that the
else clause is a catch-all rescue: that it handles any exception that does not match a previous
rescue clause. This is not what else is for. The code in an else clause is executed if the code
in the body of the begin statement runs to completion without exceptions. Putting code in an else
clause is a lot like simply tacking it on to the end of the begin clause. The only difference is
that when you use an else clause, any exceptions raised by that clause are not handled by the
rescue statements.
A begin statement may have one final clause. The optional ensure clause, if it appears, must come
after all rescue and else clauses. It may also be used by itself without any rescue or else
clauses. The ensure clause contains code that always runs, no matter what happens with the code
following begin:
def method_name(x)
# The body of the method goes here.
# Usually, the method body runs to completion without exceptions
# and returns to its caller normally.
rescue
# Exception-handling code goes here.
# If an exception is raised within the body of the method, or if
# one of the methods it calls raises an exception, then control
# jumps to this block.
else
# If no exceptions occur in the body of the method
# then the code in this clause is executed.
ensure
# The code in this clause is executed no matter what happens in the
# body of the method. It is run if the method runs to completion, if
# it throws an exception, or if it executes a return statement.
end
Fibers
Fibers reminds alot of threads but do not execute in parallell, they are more of subrutines that
returns execution back to the caller of the fiber. Fibers are mostly used for implementing
generators. Here follows an example:
f = Fiber.new { # Line 1: Create a new fiber
puts "Fiber says Hello" # Line 2:
Fiber.yield # Line 3: goto line 9
puts "Fiber says Goodbye" # Line 4:
} # Line 5: goto line 11
# Line 6:
puts "Caller says Hello" # Line 7:
f.resume # Line 8: goto line 2
puts "Caller says Goodbye" # Line 9:
f.resume # Line 10: goto line 4
The code produces the following output:
Caller says Hello
Fiber says Hello
Caller says Goodbye
Fiber says Goodbye
Fibers and their callers can exchange data through the arguments and return values of resume and
yield.
f = Fiber.new do |message|
puts "Caller said: #{message}"
message2 = Fiber.yield("Hello") # "Hello" returned by first resume
puts "Caller said: #{message2}"
"Fine" # "Fine" returned by second resume
end
response = f.resume("Hello") # "Hello" passed to block
puts "Fiber said: #{response}"
response2 = f.resume("How are you?") # "How are you?" returned by Fiber.yield
puts "Fiber said: #{response2}"
The caller passes two messages to the fiber, and the fiber returns two responses to the caller.
It prints:
Caller said: Hello
Fiber said: Hello
Caller said: How are you?
Fiber said: Fine
But fibers are more likely used as generators. Here is an example of an generator for making a
fibonacci sequence:
# Return a Fiber to compute Fibonacci numbers
def fibonacci_generator(x0,y0) # Base the sequence on x0,y0
Fiber.new do
x,y = x0, y0 # Initialize x and y
loop do # This fiber runs forever
Fiber.yield y # Yield the next number in the sequence
x,y = y,x+y # Update x and y
end
end
end
g = fibonacci_generator(0,1) # Create a generator
10.times { print g.resume, " " } # And use it
The code above prints the first 10 Fibonacci numbers:
1 1 2 3 5 8 13 21 34 55
However, you should avoid using these additional features wherever possible, because:
• They are not supported by all implementations. JRuby, for example, cannot support them on
current Java VMs.
• They are so powerful that misusing them can crash the Ruby VM.
Methods, Procs, Lambdas and Closures
Many languages distinguish between functions, which have no associated object, and methods,
which are invoked on a receiver object. Because Ruby is a purely objectoriented language, all
methods are true methods and are associated with at least one object. The methods without objects
look like global functions with no associated object. In fact, Ruby implicitly defines and invokes
them as private methods of the Object class.
Ruby’s methods are not objects in the way that strings, numbers, and arrays are. It is possible,
however, to obtain a Method object that represents a given method, and we can invoke methods
indirectly through Method objects. Blocks, like methods, are not objects that Ruby can manipulate.
But it’s possible to create an object that represents a block, and this is actually done with some
frequency in Ruby programs. A Proc object represents a block. Like a Method object, we can execute
the code of a block through the Proc that represents it. There are two varieties of Proc objects,
called procs and lambdas, which have slightly different behavior. Both procs and lambdas are
functions rather than methods invoked on an object. An important feature of procs and lambdas is
that they are closures: they retain access to the local variables that were in scope when they were
defined, even when the proc or lambda is invoked from a different scope.
Methods
A def statement that defines a method may include exception-handling code in the form of rescue,
else, and ensure clauses, just as a begin statement can. It is also, however, to use the def
statement to define a method on a single specified object. Math.sin and File.delete are actually
singleton methods.
o = "message" # A string is an object
def o.printme # Define a singleton method for this object
puts self
end
o.printme # Invoke the singleton
Method names may (but are not required to) end with an equals sign, a question mark, or an exclamation
point. An equals sign suffix signifies that the method is a setter that can be invoked using assignment
syntax. Any method whose name ends with a question mark returns a value that answers the question posed
by the method invocation. A method whose name ends with an exclamation mark should be used with
caution, this is often seen used with mutatator methods. The language has a keyword alias that serves
to define a new name for an existing method. Use it like this:
alias aka also_known_as # alias new_name existing_name
you can specify default values for some or all of the parameters. If you do this, then your method may
be invoked with fewer argument values than the declared number of parameters.
Arguments
When you define a method, you can specify default values for some or all of the parameters.
def prefix(s, len=1)
s[0,len]
end
prefix("Ruby", 3) # => "Rub"
prefix("Ruby") # => "R"
Sometimes we want to write methods that can accept an arbitrary number of arguments. To do this, we put
an * before one of the method’s parameters.
def max(first, *rest)
max = first
rest.each {|x| max = x if x > max }
max
end
max(1) # first=1, rest=[]
max(1,2) # first=1, rest=[2]
max(1,2,3) # first=1, rest=[2,3]
data = [3, 2, 1]
m = max(*data) # first = 3, rest=[2,1] => 3
m = max(data) # first = [3,2,1], rest=[] => [3,2,1]
Recall from that a block is a chunk of Ruby code associated with a method invocation, and that an
iterator is a method that expects a block. Any method invocation may be followed by a block, and any
method that has a block associated with it may invoke the code in that block with the yield statement.
If you prefer more explicit control over a block (so that you can pass it on to some other method, for
example), add a final argument to your method, and prefix the argument name with an ampersand.* If you
do this, then that argument will refer to the block—if any—that is passed to the method.
def sequence3(n, m, c, &b) # Explicit argument to get block as a Proc
i = 0
while(i < n)
b.call(i*m + c) # Invoke the Proc with its call method
i += 1
end
end
# Note that the block is still passed outside of the parentheses
sequence3(5, 2, 2) {|x| puts x }
You could also explicitly pass a Proc object like this:
def sequence4(n, m, c, b) # No ampersand used for argument b
i = 0
while(i < n)
b.call(i*m + c) # Proc is called explicitly
i += 1
end
end
p = Proc.new {|x| puts x } # Explicitly create a Proc object
sequence4(5, 2, 2, p) # And pass it as an ordinary argument
When & is used before a Proc object in a method invocation, it treats the Proc as if it was an
ordinary block following the invocation.
a, b = [1,2,3], [4,5] # Start with some data.
sum = a.inject(0) {|total,x| total+x } # => 6. Sum elements of a.
sum = b.inject(sum) {|total,x| total+x } # => 15. Add the elements of b in.
More about Procs follow in the next section
Return values
Ruby methods may return more than one value. To do this, use an explicit return statement, and
separate the values to be returned with commas:
# Convert the Cartesian point (x,y) to polar (magnitude, angle) coordinates
def polar(x,y)
return Math.hypot(y,x), Math.atan2(y,x)
end
Instead of using the return statement with multiple values, we can simply create an array of
values ourselves:
# Convert polar coordinates to Cartesian coordinates
def cartesian(magnitude, angle)
[magnitude*Math.cos(angle), magnitude*Math.sin(angle)]
end
Methods of this form are typically intended for use with parallel assignment:
distance, theta = polar(x,y)
x,y = cartesian(distance,theta)
Method Lookup
When Ruby evaluates a method invocation expression, it must first figure out which method is to be invoked.
For the method invocation expression o.m:
1. First, it checks the eigenclass of o for singleton methods named m.
2. If no method m is found in the eigenclass, Ruby searches the class of o for an instance
method named m.
3. If no method m is found in the class, Ruby searches the instance methods of any
modules included by the class of o. If that class includes more than one module,
then they are searched in the reverse of the order in which they were included. That
is, the most recently included module is searched first.
4. If no instance method m is found in the class of o or in its modules, then the search
moves up the inheritance hierarchy to the superclass. Steps 2 and 3 are repeated
for each class in the inheritance hierarchy until each ancestor class and its included
modules have been searched.
5. If no method named m is found after completing the search, then a method named
method_missing is invoked instead. In order to find an appropriate definition of this
method, the name resolution algorithm starts over at step 1.
When method_missing is invoked, the first argument is a symbol that names the method that could not be found.
This symbol is followed by all the arguments that were to be passed to the original method. If there is a block
associated with the method invocation, that block is passed to method_missing as well. Defining your own
method_missing method for a class allows you an opportunity to handle any kind of invocation on instances of the
class. The method_missing hook is one of the most powerful of Ruby’s dynamic capabilities, and one of the most
commonly used metaprogramming techniques.
class Hash
# Allow hash values to be queried and set as if they were attributes.
# We simulate attribute getters and setters for any key.
def method_missing(key, *args)
text = key.to_s
if text[-1,1] == "=" # If key ends with = set a value
self[text.chop.to_sym] = args[0] # Strip = from key
else # Otherwise...
self[key] # ...just return the key value
end
end
end
h = {} # Create an empty hash object
h.one = 1 # Same as h[:one] = 1
puts h.one # Prints 1. Same as puts h[:one]
Procs and Lambdas
Blocks are syntactic structures in Ruby; they are not objects, and cannot be manipulated as objects.
It is possible, however, to create an object that represents a block. Depending on how the object is
created, it is called a proc or a lambda.
We’ve already seen one way to crfate a Proc object: by associating a block with a method that is
defined with an ampersand-prefixed block argument. There is nothing preventing such a method from
returning the Proc object for use outside the method:
def makeproc(&p)
p # Return the Proc object
end
adder = makeproc {|x,y| x+y }
All Proc objects have a call method that, when invoked, runs the code contained by the block from
which the proc was created.
sum = adder.call(2,2) # => 4
This example is ofcourse just for explanation, and the methos makeproc is no necessary in reality as
ruby already have methods that have this functionality. Proc.new expects no arguments, and returns a
Proc object that is a proc (not a lambda). You could also call its Proc.new´s synonymous method proc.
p = Proc.new {|x,y| x+y }
p = proc {|x,y| x+y }
Another technique for creating Proc objects is with the lambda method. lambda is a method of the
Kernel module.
The difference between a proc end a lambda is small, but significant in some scenarios. A proc is the
object form of a block, and it behaves like a block. A lambda has slightly modified behavior and
behaves more like a method than a block. Calling a proc is like yielding to a block, whereas calling
a lambda is like invoking a method. Recall that the return statement in a block does not just return
from the block to the invoking iterator, it returns from the method that invoked the iterator. A
return statement in a lambda returns from the lambda itself, not from the method that surrounds the
creation site of the lambda:
def test
puts "entering method"
p = lambda { puts "entering lambda"; return }
p.call # Invoking the lambda does not make the method return
puts "exiting method" # This line *is* executed now
end
The fact that return in a lambda only returns from the lambda itself means that we never have to worry
about LocalJumpError.
Invoking a block with yield is similar to, but not the same as, invoking a method. There are
differences in the way argument values in the invocation are assigned to the argument variables declared
in the block or method.
p = Proc.new {|x,y| print x,y }
p.call(1) # x,y=1: nil used for missing rvalue: Prints 1nil
p.call(1,2) # x,y=1,2: 2 lvalues, 2 rvalues: Prints 12
p.call(1,2,3) # x,y=1,2,3: extra rvalue discarded: Prints 12
p.call([1,2]) # x,y=[1,2]: array automatically unpacked: Prints 12
l = lambda {|x,y| print x,y }
l.call(1,2) # This works
l.call(1) # Wrong number of arguments
l.call(1,2,3) # Wrong number of arguments
l.call([1,2]) # Wrong number of arguments
l.call(*[1,2]) # Works: explicit splat to unpack the array
Closures
In Ruby, procs and lambdas are closures. When you create a proc or a lambda, the resulting Proc object
holds not just the executable block but also bindings for all the variables used by the block. You
already know that blocks can use local variables and method arguments that are defined outside the block.
def multiply(data, n)
data.collect {|x| x*n }
end
What is more interesting, and possibly even surprising, is that if the block were turned into a proc or
lambda, it could access n even after the method to which it is an argument had returned.
# Return a lambda that retains or "closes over" the argument n
def multiplier(n)
lambda {|data| data.collect{|x| x*n } }
end
doubler = multiplier(2) # Get a lambda that knows how to double
puts doubler.call([1,2,3]) # Prints 2,4,6
It is important to understand that a closure does not just retain the value of the variables it refers
to—it retains the actual variables and extends their lifetime. Another way to say this is that the
variables used in a lambda or proc are not statically bound when the lambda or proc is created. Instead,
the bindings are dynamic, and the values of the variables are looked up when the lambda or proc is
executed.
# Return a pair of lambdas that share access to a local variable.
def accessor_pair(initialValue=nil)
value = initialValue # A local variable shared by the returned lambdas.
getter = lambda { value } # Return value of local variable.
setter = lambda {|x| value = x } # Change value of local variable.
return getter,setter # Return pair of lambdas to caller.
end
getX, setX = accessor_pair(0) # Create accessor lambdas for initial value 0.
puts getX[] # Prints 0. Note square brackets instead of call.
setX[10] # Change the value through one closure.
puts getX[] # Prints 10. The change is visible through the other.
Any time you have a method that returns more than one closure, you should pay particular attention to
the variables they use.
def multipliers(*args)
x = nil
args.map {|x| lambda {|y| x*y }}
end
double,triple = multipliers(2,3)
puts double.call(2) # Prints 6 in Ruby 1.8 but 4 in Ruby 1.9
puts triple.call(5) # Prints 15 in Ruby 1.9
Method objects
The Method class is not a subclass of Proc, but it behaves much like it. Method objects are invoked with
the call method (or the [] operator), just as Proc objects are. The Object class defines a method named
method. Pass it a method name, as a string or a symbol, and it returns a Method object representing the
named method of the receiver.
m = 0.method(:succ) # A Method representing the succ method of Fixnum 0
puts m.call # => 1. Same as puts 0.succ. Or use puts m[].
m.name # => :succ
m.owner # => Fixnum
m.receiver # => 0
Method object uses method-invocation semantics, not yield semantics. Method objects, therefore, behave
more like lambdas than like procs. When a true Proc is required, you can use Method.to_proc to convert a
Method to a Proc. This is why Method objects can be prefixed with an ampersand and passed to a method in
place of a block.
def square(x); x*x; end
puts (1..10).map(&method(:square))
One important difference between Method objects and Proc objects is that Method objects are not closures.
Ruby’s methods are intended to be completely self-contained, and they never have access to local variables
outside of their own scope. The only binding retained by a Method object, therefore, is the value of self—
the object on which the method is to be invoked.
In addition to the Method class, Ruby also defines an UnboundMethod class. As its name suggests, an
UnboundMethod object represents a method, without a binding to the object on which it is to be invoked.
In order to invoke an unbound method, you must first bind it to an object using the bind method:
unbound_plus = Fixnum.instance_method("+") # creates an unbound object
plus_2 = unbound_plus.bind(2) # Bind the method to the object 2
sum = plus_2.call(2) # => 4
Classes and Modules
Classes may extend or subclass other classes, and inherit or override the methods of their superclass. Classes
can also include—or inherit methods from—modules. The methods defined by a class may have “public,” “protected,”
or “private” visibility, which affects how and where they may be invoked. Ruby’s objects are strictly
encapsulated: their state can be accessed only through the methods they define. In contrast to the strict
encapsulation of object state, Ruby’s classes are very open. Any Ruby program can add methods to existing
classes, and it is even possible to add “singleton methods” to individual objects.
class Point
end
p = Point.new
p.class # => Point
p.is_a? Point # => true
In addition to defining a new class, the class keyword creates a new constant to refer to the class. The class
name and the constant name are the same, so all class names must begin with a capital letter. Within the body of
a class, but outside of any instance methods defined by the class, the self keyword refers to the class being
defined.
The “constructor” in Ruby, it is done with an initialize method:
class Point
@@n = 0 # Classvariable: How many points have been created
def initialize(x,y)
@x, @y = x, y # Instancevariables (Inside of methods)
@@n += 1
end
ORIGIN = Point.new(0,0) # Constant
def x # The accessor (or getter) method for @x
@x
end
def y # The accessor method for @y
@y
end
def x=(value) # The setter method for @x
@x = value
end
def y=(value) # The setter method for @y
@y = value
end
end
p = Point.new(0,0)
origin = Point::ORIGIN
Point::ORIGIN.instance_variables # => ["@y", "@x"]
Point.class_variables # => ["@@n"]
Point.constants # => ["ORIGIN"]
If the initialize method would be written in ruby actully it would look something like:
def new(*args)
o = self.allocate # Create a new object of this class
o.initialize(*args) # Call the object's initialize method with our args
o # Return new object; ignore return value of initialize
end
The combination of instance variable with trivial getter and setter methods is so common that Ruby provides a
way to automate it. The attr_reader and attr_accessor methods are defined by the Module class, which is extended
by the Class class.
class Point
attr_accessor :x, :y # Define accessor methods for a mutable object
end
class Point
attr_reader :x, :y # Define reader methods for a immutable object
end
If you want to define a new instance method of a class or module, use define_method. This instance method of Module
takes the name of the new method (as a Symbol) as its first argument.
# Add an instance method named m to class c with body b
def add_method(c, m, &b)
c.class_eval {
define_method(m, &b)
}
end
add_method(String, :greet) { "Hello, " + self }
"world".greet # => "Hello, world"
define_method is used mostly in metaprogramming contexts and is preferably used instead of method_missing because
method_missing could make a program behave in strange ways if you don´t also define other methods such as respond_to? and so
on. Here is another example of define_method so you really get a grip of this powerful method.
class Multiplier
def self.create_multiplier(n) # Creates a classmethod, more about these methods later
define_method("times_#{n}") do |val|
val * n
end
end
create_multiplier(2)
create_multiplier(3)
end
m = Multiplier.new
puts m.times_2(3) # => 6
puts m.times_3(4) # => 12
The attr_reader and attr_accessor methods also define new methods for a class. Like define_method, these are private
methods of Module and can easily be implemented. This is a metaprogramming aspect of Ruby that lets you write code that
writes code. They accept attribute names as their arguments, and dynamically create methods with those names. If you
dont want a class to be dynamically changed you could use the freeze method on the class. Once frozen, a class cannot be
altered.
Here is another example of metaprogramming to illustrate how it works:
class Module
private # The methods that follow are both private
# This method works like attr_reader, but has a shorter name
def readonly(*syms)
return if syms.size == 0 # If no arguments, do nothing
code = "" # Start with an empty string of code
syms.each do |s|
code << "def #{s}; @#{s}; end\n" # The method definition
end
# Finally, class_eval the generated code to create instance methods.
class_eval code
end
# This method works like attr_accessor, but has a shorter name.
def readwrite(*syms)
return if syms.size == 0
code = ""
syms.each do |s|
code << "def #{s}; @#{s} end\n"
code << "def #{s}=(value); @#{s} = value; end\n"
end
class_eval code
end
end
You might wonder how come the getter knows what @x it is referring as Ruby is strictly objectoriented and there
is no target for the method, but that is not the whole truth since self is always automatically being invoced if
no target exists.
In addition to being automatically invoked by Point.new, the initialize method is automatically made private. An
object can call initialize on itself, but you cannot explicitly call initialize on p to reinitialize its state.
Instance variables always begin with @, and they always “belong to” whatever object self refers to. In statically
typed languages, you must declare your variables, including instance variables. In Ruby variables don’t need to
be declared. In fact if you do initialize them it means that you are doing so outside of a instance method where
the self keyword is referring to the class itself and not to the instance. Therefore the variables outside of the
instance methods and the variables inside of instance methods are referring to different variables. Class variables,
are for example always evaluated in reference to the class object created by the enclosing class definition
statement. Class variables are shared by a class and all of its subclasses. If a class A defines a variable @@a,
then subclass B can use that variable. But the difference from inherited instance variables is that if the subclass
changes the class variable then it shows in the superclass also. It is really shared.
Class instance variables are instance variables used inside a class definition but outside an instance method
definition is a class instance variable. Like class variables, class instance variables are associated with the
class rather than with any particular instance of the class. Because they are prefixed with @ it is very easy to
confuse them with intancevariables. Without the distinctive punctuation prefixes, it may be more difficult to
remember whether a variable is associated with instances or with the class object. One of the most important
advantages of class instance variables over class variables has to do with the confusing behavior of class
variables when subclassing an existing class. If we use class instance variables instead for class variables the
only difficulty is that because class instance variables cannot be used from instance methods, we must move the
statistics gathering code out of the initialize method (which is an instance method):
class Point
@n = 0
def initialize(x,y) # Initialize method
@x,@y = x, y # Sets initial values for instance variables
end
def self.new(x,y) # Class method to create new Point objects
@n += 1
super # Invoke the real definition of new to create a Point
end
# other methods
end
Method visibility and Inheritance
class Point
# public methods...
protected
# protected methods...
private
# private methods...
end
or
class Point
def example_method
nil
end
private :example_method # now its private
end
To extend a class
class Point3D < Point
end
It is also perfectly reasonable to define an abstract class that invokes certain undefined “abstract” methods,
which are left for subclasses to define.
class AbstractGreeter
def greet
puts "#{greeting} #{who}"
end
end
# A concrete subclass
class WorldGreeter < AbstractGreeter
def greeting; "Hello"; end
def who; "World"; end
end
WorldGreeter.new.greet # Displays "Hello World"
Private methods cannot be invoked from outside the class that defines them. But they are inherited by subclasses.
This means that subclasses can invoke them and can override them. Sometimes when we override a method, we don’t
want to replace it altogether, we just want to augment its behavior by adding some new code. In order to do this,
we need a way to invoke the overridden method from the overriding method. This is known as chaining, and it is
accomplished with the keyword super. Super works like a special method invocation: it invokes a method with the
same name as the current one, in the superclass of the current class.
class Point3D < Point
def initialize(x,y,z)
super
@z = z;
end
end
If you use super as a bare keyword—with no arguments and no parentheses—then all of the arguments that were passed
to the current method are passed to the superclass method. If the method has modified the parameters then that will
affect the supermethod. If you want to pass zero arguments to the supermethod you must specify it with empty
parantesis.
Module, Class, and Object implement several callback methods, or hooks. These methods are not defined by default,
but if you define them for a module, class, or object, then they will be invoked when certain events occur. When a
new class is defined, Ruby invokes the class method inherited on the superclass of the new class, passing the new
class object as the argument. This allows classes to add behavior to or enforce constraints on their descendants.
Class methods are inherited, so that the an inherited method will be invoked if it is defined by any of the ancestors
of the new class. Define Object.inherited to receive notification of all new classes that are defined:
def Object.inherited(c)
puts "class #{c} < #{self}"
end
def String.method_added(name)
puts "New instance method #{name} added to String"
end
If you want to check an objects methods you could use the Ruby language reflective capabilities.
o.methods # => [ names of all public methods ]
o.public_methods # => the same thing
o.public_methods(false) # Exclude inherited methods
o.protected_methods # => []: there aren't any
o.private_methods # => array of all private methods
o.private_methods(false) # Exclude inherited private methods
String.instance_methods == "s".public_methods # => true
String.instance_methods(false) == "s".public_methods(false) # => true
String.public_instance_methods == String.instance_methods # => true
String.protected_instance_methods # => []
String.private_instance_methods(false) # => ["initialize_copy",
# "initialize"]
String.public_method_defined? :reverse # => true
String.protected_method_defined? :reverse # => false
String.private_method_defined? :initialize # => true
String.method_defined? :upcase! # => true
Arrayifying, Hash access, and Equlity (Ducktyping)
If you want to make the Point class to behave like an array or hash or even give the class an own iterator you
could add methods that makes it possible, for example:
def [](index)
case index
when 0, -2: @x # Index 0 (or -2) is the X coordinate
when 1, -1: @y # Index 1 (or -1) is the Y coordinate
when :x, "x": @x # Hash keys as symbol or string for X
when :y, "y": @y # Hash keys as symbol or string for Y
else nil # Arrays and hashes just return nil on bad indexes
end
end
def each
yield @x
yield @y
end
p = Point.new(1,2)
p.each {|x| print x } # Prints "12"
This approach is sometimes called “duck typing,” after the adage “if it walks like a duck and quacks like a
duck, it must be a duck.” More importantly, defining the each iterator allows us to mix in the methods of the
Enumerable module, all of which are defined in terms of each. Our class gains over 20 iterators by adding a
single line:
include Enumerable
If we do this, then we can write interesting code like this:
# Is the point P at the origin?
p.all? {|x| x == 0 } # True if the block is true for all elements
Here is an == method for Point:
def ==(o) # Is self == o?
if o.is_a? Point # If o is a Point object
@x==o.x && @y==o.y # then compare the fields.
elsif # If o is not a Point
false # then, by definition, self != o.
end
end
A more liberal definition of equality would support duck typing. Some caution is required, however. Our ==
method should not raise a NoMethodError if the argument object does not have x and y methods. Instead, it
should simply return false:
def ==(o) # Is self == o?
@x == o.x && @y == o.y # Assume o has proper x and y methods
rescue # If that assumption fails
false # Then self != o
end
Another way of implementing equality is by defining <=> method and including the Comparable module:
include Comparable # Mix in methods from the Comparable module.
# Define an ordering for points based on their distance from the origin.
# This method is required by the Comparable module.
def <=>(other)
return nil unless other.instance_of? Point
@x**2 + @y**2 <=> other.x**2 + other.y**2
end
Our distance-based comparison operator results in an == method that considers the points (1,0) and (0,1) to
be equal.
Because eql? is used for hashes, you must never implement this method by itself. If you define an eql?
method, you must also define a hash method to compute a hashcode for your object. If two objects are equal
according to eql?, then their hash methods must return the same value.
def hash
code = 17
code = 37*code + @x.hash
code = 37*code + @y.hash
# Add lines like this for each significant instance variable
code # Return the resulting code
end
Structs
If you want a mutable Point class, one way to create it is with Struct. Struct is a core Ruby class
that generates other classes.
Struct.new("Point", :x, :y) # Creates new class Struct::Point
Point = Struct.new(:x, :y) # Creates new class, assigns to Point
p = Point.new(1,2) # => #<struct Point x=1, y=2>
The second line in the code relies on a curious fact about Ruby classes: if you assign an unnamed class
object to a constant. Structs also define the [] and []= operators for array and hash-style indexing, a
working == operator, a helpful to_s, and even provide each and each_pair iterators.
We can make a Struct-based class immutable:
Point = Struct.new(:x, :y) # Define mutable class
class Point # Open the class
undef x=,y=,[]= # Undefine mutator methods
end
Class methods
To define a class method for the Point class, what we are really doing is defining a singleton method
of the Point object. Class methods are invoked implicitly on self, and the value of self in a class method
is the class on which it was invoked.
class Point
attr_reader :x, :y
def self.sum(*points) # Return the sum of an arbitrary number of points
x = y = 0
points.each {|p| x += p.x; y += p.y }
Point.new(x,y)
end
end
total = Point.sum(p1, p2, p3)
Within the body of a class method, you may invoke the other class methods of the class without an explicit
receiver.
class Point3D < Point
def self.sum(*points2D)
superclass.sum(*points2D)
end
end
There is yet another technique for defining class methods. Though it is less clear than the previously shown
technique, it can be handy when defining multiple class methods.
class << Point # Syntax for adding methods to a single object
def sum(*points) # This is the class method Point.sum
x = y = 0
points.each {|p| x += p.x; y += p.y }
Point.new(x,y)
end
# Other class methods can be defined here
end
Another way of doing the same thing:
class Point
# Instance methods go here
class << self
# Class methods go her e
end
end
Clone and Dup
These methods allocate a new instance of the class of the object on which they are invoked. They then copy all
the instance variables and the taintedness of the receiver object to the newly allocated object. clone takes this
copying a step further than dup—it also copies singleton methods of the receiver object and freezes the copy
object if the original is frozen.
animal = Object.new
def animal.nr_of_feet=(feet)
@feet = feet
end
def animal.nr_of_feet
@feet
end
animal.nr_of_feet = 4
felix = animal.clone
felix.nr_of_feet # => 4
What we get here is a more powerful, or differenet kind of inheritance not much unlike the one used in JavaScript,
called prototypal inheritance.
If a class defines a method named initialize_copy, then clone and dup will invoke that method on the copied object
after copying the instance variables from the original. clone calls initialize_copy before freezing the copy object,
so that initialize_copy is still allowed to modify it. Like initialize, Ruby ensures that initialize_copy is always
private.
def initialize_copy(orig) # If someone copies this Point object
@feet = @feet.dup # Make a copy of the nr of feet too
end
Modules
The difference between Modules and classes is that a Module can not be instantiated and cannot be subclassed.
Modules are used as namespaces and mixins. Class is a subclass of Module.
Namespaces
module Base64
class Encoder
def encode
end
end
class Decoder
def decode
end
end
def Base64.helper
end
end
By structuring our code this way, we’ve defined two new classes, Base64::Encoder and Base64::Decoder. Because
classes are modules, they too can be nested. Nesting one class within another only affects the namespace of the
inner class; it does not give that class any special access to the methods or variables of the outer class.
Mixins
If a module defines instance methods instead of the class methods, those instance methods can be mixed in to
other classes. Enumerable and Comparable are well-known examples of mixin modules.
class Point
include Comparable
end
When a module is included into a class or into another module, the included class method of the included module
is invoked with the class or module object into which it was included as an argument. This gives the included
module an opportunity to augment or alter the class in whatever way it wants—it effectively allows a module to
define its own meaning for include.
module Final # A class that includes Final can't be subclassed
def self.included(c) # When included in class c
c.instance_eval do # Define a class method of c
def inherited(sub) # To detect subclasses
raise Exception, # And abort with an exception
"Attempt to create subclass #{sub} of Final class #{self}"
end
end
end
end
Load and Require
Ruby programs may be broken up into multiple files, and the most natural way to partition a program is to place
each nontrivial class or module into a separate file. These separate files can then be reassembled into a single
program using load and require keywords. These are global functions defined in Kernel, but are used like language
keywords.
There are some differences between load and require. require can also load binary extensions to Ruby.
load expects a complete filename including an extension. require is usually passed a library name, with no
extension, rather than a filename. In that case, it searches for a file that has the library name as its base
name and an appropriate source or native library extension. load can load the same file multiple times. require
tries to prevent multiple loads of the same file. require keeps track of the files that have been loaded by
appending them to the global array $" (also known as $LOADED_FEATURES). load does not do this.
Files loaded with load or require are executed in a new top-level scope that is different from the one in which
load or require was invoked. The loaded file can see all global variables and constants that have been defined at
the time it is loaded, but it does not have access to the local scope from which the load was initiated.
The autoload methods of Kernel and Module allow lazy loading of files on an as-needed basis. When the autoload
funcion is used the first time it registers the in a constant through require.
# Require 'socket' if and when the TCPSocket is first used
autoload :TCPSocket, "socket"
Use autoload? or Module.autoload? to test whether a reference to a constant will cause a file to be loaded. This
method expects a symbol argument. If a file will be loaded when the constant named by the symbol is referenced, then
autoload? returns the name of the file otherwise nil.
Loadpaths
Ruby’s load path is an array that you can access using either of the global variables $LOAD_PATH or $:. Each
element of the array is the name of a directory that Ruby will search for files to load. The /usr/lib/ruby/1.8/
directory is where the Ruby standard library is installed. The /usr/lib/ruby/1.8/i386-linux/ directory holds Linux
binary extensions for the standard library. The site_ruby directories in the path are for site-specific libraries
that you have installed. The more significant load path change in Ruby 1.9 is the inclusion of RubyGems
installation directories. RubyGems is built into Ruby 1.9: the gem command is distributed with Ruby and can be used
to install new packages whose installation directories are automatically added to the default load path.
Eigenclass
We learned that you could apply a singleton method on a single object. The singleton methods of an object are not
defined by the class of that object. But they are methods and they must be associated with a class of some sort.
The singleton methods of an object are instance methods of the anonymous eigenclass associated with that object.
The eigenclass is also called the singleton class or (less commonly) the metaclass. Ruby defines a syntax for
opening the eigenclass of an object and adding methods to it.
To open the eigenclass of the object o, use class << o. For example, we can define class methods of Point like this:
class << Point
def class_method # This is an instance method of the eigenclass.
end # It is also a class method of Point.
end
We can formalize this into a method of Object, so that we can ask for the eigenclass of any object:
class Object
def eigenclass
class << self; self; end
end
end
Unless you are doing sophisticated metaprogramming with Ruby, you are unlikely to really need an eigenclass.
Other useful stuff
Threads
Ruby makes it easy to write multi-threaded programs with the Thread class. To start a new thread, just associate a block
with a call to Thread.new.
# Thread 1 is running here
Thread.new {
# Thread #2 runs this code
}
# Thread 1 runs this code
A thread runs the code in the block associated with the call to Thread.new and then it stops running. The value of the
last expression in that block is the value of the thread, and can be obtained by calling the value method of the Thread
object. If the thread has run to completion, then the value returns the thread’s value right away. Otherwise, the
value method blocks and does not return until the thread has completed. One of the key features of threads is that they
can share access to variables. Because threads are defined by blocks, they have access to whatever variables (local
variables, instance variables, global variables, and so on) are in the scope of the block.
x = 0
t1 = Thread.new do
x++
end
t2 = Thread.new do
x--
end
But if you run the following code
n = 1
while n <= 3
Thread.new { puts n }
n += 1
end
It is not certain that the code will run always as expected. In some cricumstances it may print out 4, 4, 4 instead of 1,
2, 3 because the threads may not as predictably as sequential code. One way of fixing this would be to make the variable
private:
n = 1
while n <= 3
# Get a private copy of the current value of n in x
Thread.new(n) {|x| puts x }
n += 1
end
The class method Thread.current returns the Thread object that represents the current thread. This allows threads to
manipulate themselves. The class method Thread.main returns the Thread object that represents the main thread—this is the
initial thread of execution that began when the Ruby program was started.
The main thread is special: the Ruby interpreter stops running when the main thread is done. You must ensure, therefore,
that your main thread does not end while other threads are still running. We’ve already mentioned that you can call the
value method of a thread to wait for it to finish. If you don’t care about the value of your threads, you can wait with
the join method instead.
def join_all
main = Thread.main # The main thread
current = Thread.current # The current thread
all = Thread.list # All threads still running
all.each {|t| t.join unless t == current or t == main }
end
If an exception is raised in the main thread, and is not handled anywhere, the Ruby interpreter prints a message and
exits. In threads other than the main thread, unhandled exceptions cause the thread to stop running. If a thread t exits
because of an unhandled exception, and another thread s calls t.join or t.value, then the exception that occurred in t is
raised in the thread s. If you want an unhandled exception in any thread to cause the interpreter to stop:
Thread.abort_on_exception = true
If you want the interpreter to stop on a particular tread t use:
t.abort_on_exception = true
When true parallel processing is not possible, it is simulated by sharing a CPU among threads. The process for sharing a
CPU among threads is called thread scheduling. The first factor that affects thread scheduling is thread priority: high-
priority threads are scheduled before low-priority threads. Set and query the priority of a Ruby Thread object with
priority= and priority. Note that there is no way to set the priority of a thread before it starts running. A newly
created thread starts at the same priority as the thread that created it. The main thread starts off at priority 0. Under
Linux, for example, nonprivileged threads cannot have their priorities raised or lowered. So in Ruby 1.9 (which uses
native threads) on Linux, the thread priority setting is ignored.
A Ruby thread may be in one of five possible states. The two most interesting states are for live threads: a thread that
is alive is runnable or sleeping. A runnable thread is one that is currently running, or that is ready and eligible to run
the next time there are CPU resources for it. A sleeping thread is one that is sleeping, that is waiting for I/O, or that
has stopped itself. There are two thread states for threads that are no longer alive. A terminated thread has either
terminated normally or has terminated abnormally with an exception. Finally, there is one transitional state. A thread
that has been killed but that has not yet terminated is said to be aborting.
Calling Thread.stop is effectively the same thing as calling Kernel.sleep with no argument: the thread pauses forever.
Threads also temporarily enter the sleeping state if they call Kernel.sleep with an argument. In this case, they
automatically wake up and reenter the runnable state after (approximately) the specified number of seconds pass. way for a
thread to terminate normally is by calling Thread.exit. Note that any ensure clauses are processed before a thread exits
in this way. A thread can forcibly terminate another thread by invoking the instance method killon the thread to be
terminated. terminate and exit are synonyms for kill. The Thread.list method returns an array of Thread objects
representing all live (running or sleeping) threads.
If you want to impose some order onto a subset of threads, you can create a ThreadGroup object and add threads to it:
group = ThreadGroup.new
3.times {|n| group.add(Thread.new { do_task(n) }}
New threads are initially placed in the group to which their parent belongs.
Thread are normally used for IO bound programs. Here are some examples of use.
def conread(filenames)
h = {} # Empty hash of results
# Create one thread for each file
filenames.each do |filename| # For each named file
h[filename] = Thread.new do # Create a thread, map to filename
open(filename) {|f| f.read } # Open and read the file
end # Thread value is file contents
end
Module afterevery
# Execute block after sleeping the specified number of seconds.
def after(seconds, &block)
Thread.new do # In a new thread...
sleep(seconds) # First sleep
block.call # Then call the block
end # Return the thread
end
# Repeatedly sleep and then execute the block.
# Pass value to the block on the first invocation.
# On subsequent invocations, pass the value of the previous invocation.
def every(seconds, value=nil, &block)
Thread.new do # In a new thread...
loop do # Loop forever (or until break in block)
sleep(seconds) # Sleep
value = block.call(value) # And invoke block
end # Then repeat..
end # every returns the Thread
end
end
require 'afterevery'
1.upto(5) {|i| after i { puts i} } # Slowly print the numbers 1 to 5
sleep(5) # Wait five seconds
every 1, 6 do |count| # Now slowly print 6 to 10
puts count
break if count == 10
count + 1 # The next value of count
end
sleep(6) # Give the above time to run
When writing programs that use multiple threads, it is important that two threads do not attempt to modify the same
object at the same time. One way to do this is to place the code that must be made thread-safe in a block associated
with a call to the synchronize method of a Mutex object.
class BankAccount
def init(name, checking, savings)
@name,@checking,@savings = name,checking,savings
@lock = Mutex.new # For thread safety
end
# Lock account and transfer money from savings to checking
def transfer_from_savings(x)
@lock.synchronize {
@savings -= x
@checking += x
}
end
# Lock account and report current balances
def report
@lock.synchronize {
"#@name\nChecking: #@checking\nSavings: #@savings"
}
end
end
When writing programs that use multiple threads, it is important that two threads do not attempt to modify the same object
at the same time. One way to do this is to place the code that must be made thread-safe in a block associated with a call
to the synchronize method of a Mutex object.Another example but dynamically programs the Object class to emulate Java’s
synchronized keyword with a global method named synchronized.
class Object
# Return the Mutex for this object, creating it if necessary.
def mutex
# If this object already has a mutex, just return it
return @__mutex if @__mutex
# Otherwise, we've got to create a mutex for the object.
# To do this safely we've got to synchronize on our class object.
synchronized(self.class) {
@__mutex = @__mutex || Mutex.new
}
# The return value is @__mutex
end
end
require 'thread' # Ruby 1.8 keeps Mutex in this library
# This works like the synchronized keyword of Java.
def synchronized(o)
o.mutex.synchronize { yield }
end
# The Object.mutex method defined above needs to lock the class
# if the object doesn't have a Mutex yet. If the class doesn't have
# its own Mutex yet, then the class of the class (the Class object)
# will be locked. In order to prevent infinite recursion, we must
# ensure that the Class object has a mutex.
Class.instance_eval { @__mutex = Mutex.new }
Another way of doing it the Ruby way is by invoking the method_missing method. This way the class SynchronizedObject
modifies this method so that, when invoked without a block, it returns a SynchronizedObject wrapper around the object.
SynchronizedObject is a delegating wrapper class based on method_missing.
class SynchronizedObject < BasicObject
def initialize(o); @delegate = o; end
def __delegate; @delegate; end
def method_missing(*args, &block)
@delegate.mutex.synchronize {
@delegate.send *args, &block
}
end
end
def synchronized(o)
if block_given?
o.mutex.synchronize { yield }
else
SynchronizedObject.new(o)
end
end
Now you may wonder over the send method use in the example. send invokes on its receiver the method named by its first
argument, passing any remaining arguments to that method.
"hello".send :upcase # => "HELLO": invoke an instance method
Math.send(:sin, Math::PI/2) # => 1.0: invoke a class method
Tracing
The trace method returns an instance of TracedObject that uses method_missing to catch invocations, trace them, and
delegate them to the object being traced. You might use it like this for debugging:
class TracedObject
# Undefine all of our noncritical public instance methods.
# Note the use of Module.instance_methods and Module.undef_method.
instance_methods.each do |m|
m = m.to_sym # Ruby 1.8 returns strings, instead of symbols
next if m == :object_id || m == :__id__ || m == :__send__
undef_method m
end
# Initialize this TracedObject instance.
def initialize(o, name, stream)
@o = o # The object we delegate to
@n = name # The object name to appear in tracing messages
@trace = stream # Where those tracing messages are sent
end
# This is the key method of TracedObject. It is invoked for just
# about any method invocation on a TracedObject.
def method_missing(*args, &block)
m = args.shift # First arg is the name of the method
begin
# Trace the invocation of the method.
arglist = args.map {|a| a.inspect}.join(', ')
@trace << "Invoking: #{@n}.#{m}(#{arglist}) at #{caller[0]}\n"
# Invoke the method on our delegate object and get the return value.
r = @o.send m, *args, &block
# Trace a normal return of the method.
@trace << "Returning: #{r.inspect} from #{@n}.#{m} to #{caller[0]}\n"
# Return whatever value the delegate object returned.
r
rescue Exception => e
# Trace an abnormal return from the method.
@trace << "Raising: #{e.class}:#{e} from #{@n}.#{m}\n"
# And re-raise whatever exception the delegate object raised.
raise
end
end
# Return the object we delegate to.
def __delegate
@o
end
end
class Object
def trace(name="", stream=STDERR)
# Return a TracedObject that traces and delegates everything else to us.
TracedObject.new(self, name, stream)
end
end
a = [1,2,3].trace("a")
a.reverse
puts a[2]
puts a.fetch(3)
This produces the following tracing output:
Invoking: a.reverse() at trace1.rb:66
Returning: [3, 2, 1] from a.reverse to trace1.rb:66
Invoking: a.fetch(3) at trace1.rb:67
Raising: IndexError:index 3 out of array from a.fetch
Eval
eval is a very powerful function, but unless you are actually writing a shell program (like irb) that executes lines
of Ruby code entered by a user you are unlikely to really need it.
x = 1
eval "x + 1" # => 2
A Binding object represents the state of Ruby’s variable bindings at some moment. The Kernel.binding object returns
the bindings in effect at the location of the call. You may pass a Binding object as the second argument to eval, and
the string you specify will be evaluated in the context of those bindings. For example to peek inside of a object:
class Object # Open Object to add a new method
def bindings # Note plural on this method
binding # This is the predefined Kernel method
end
end
class Test # A simple class with an instance variable
def initialize(x); @x = x; end
end
t = Test.new(10) # Create a test object
eval("@x", t.bindings) # => 10: We've peeked inside t
The Object class defines a method named instance_eval, and the Module class defines a method named class_eval. Both
of these methods evaluate Ruby code, like eval does, but there are two important differences. The first difference is
that they evaluate the code in the context of the specified object or in the context of the specified module—the object
or module is the value of self while the code is being evaluated.
o.instance_eval("@x") # Return the value of o's instance variable @x
# Define an instance method len of String to return string length
String.class_eval("def len; size; end")
String.class_eval("alias len size")
Monkey Patching
As we’ve seen, metaprogramming in Ruby often involves the dynamic definition of methods. Just as common is the dynamic
modification of methods. Methods are modified with a technique we’ll call alias chaining.* It works like this:
• First, create an alias for the method to be modified. This alias provides a name for
the unmodified version of the method.
• Next, define a new version of the method. This new version should call the
unmodified version through the alias, but it can add whatever functionality is
needed before and after it does that.
class Foo
def bar
'Hello'
end
end
class Foo
alias_method :old_bar, :bar
def bar
old_bar + ' World'
end
end
Foo.new.bar # => 'Hello World'
Foo.new.old_bar # => 'Hello'
One way of using this could be for example to write a traceprogram of your program:
module ClassTrace
# This array holds our list of files loaded and classes defined.
T = [] # Array to hold the files loaded
# Now define the constant OUT to specify where tracing output goes.
# This defaults to STDERR, but can also come from command-line arguments
if x = ARGV.index("--traceout") # If argument exists
OUT = File.open(ARGV[x+1], "w") # Open the specified file
ARGV[x,2] = nil # And remove the arguments
else
OUT = STDERR # Otherwise default to STDERR
end
end
# Alias chaining step 1: define aliases for the original methods
alias original_require require
alias original_load load
# Alias chaining step 2: define new versions of the methods
def require(file)
ClassTrace::T << [file,caller[0]] # Remember what was loaded where
original_require(file) # Invoke the original method
end
def load(*args)
ClassTrace::T << [args[0],caller[0]] # Remember what was loaded where
original_load(*args) # Invoke the original method
end
# This hook method is invoked each time a new class is defined
def Object.inherited(c)
ClassTrace::T << [c,caller[0]] # Remember what was defined where
end
# Kernel.at_exit registers a block to be run when the program exits
# We use it to report the file and class data we collected
at_exit {
o = ClassTrace::OUT
o.puts "="*60
o.puts "Files Loaded and Classes Defined:"
o.puts "="*60
ClassTrace::T.each do |what,where|
if what.is_a? Class # Report class (with hierarchy) defined
o.puts "Defined: #{what.ancestors.join('<-')} at #{where}"
else # Report file loaded
o.puts "Loaded: #{what} at #{where}"
end
end
}
DSL´s
The goal of metaprogramming in Ruby is often the creation of domain-specific languages, or DSLs. A DSL is just an
extension of Ruby’s syntax (with methods that look like keywords) or API that allows you to solve a problem or
represent data more naturally than you could otherwise. For our examples, we’ll take the problem domain to be the
output of XML formatted data, and we’ll define two DSLs—one very simple and one more clever—to tackle this problem.
method_missing variant:
pagetitle = "Test Page for XML.generate"
XML.generate(STDOUT) do
html do
head do
title { pagetitle }
comment "This is a test"
end
body do
h1(:style => "font-family:sans-serif") { pagetitle }
ul :type=>"square" do
li { Time.now }
li { RUBY_VERSION }
end
end
end
end
Output:
<html><head>
<title>Test Page for XML.generate</title>
<!-- This is a test -->
</head><body>
<h1 style='font-family:sans-serif'>Test Page for XML.generate</h1>
<ul type='square'>
<li>2007-08-19 16:19:58 -0700</li>
<li>1.9.0</li>
</ul></body></html>
The implementation:
class XML
# Create an instance of this class, specifying a stream or object to
# hold the output. This can be any object that responds to <<(String).
def initialize(out)
@out = out # Remember where to send our output
end
# Output the specified object as CDATA, return nil.
def content(text)
@out << text.to_s
nil
end
def comment(text)
@out << "<!-- #{text} -->"
nil
end
# Output a tag with the specified name and attributes.
# If there is a block invoke it to output or return content.
# Return nil.
def tag(tagname, attributes={})
@out << "<#{tagname}"
attributes.each {|attr,value| @out << " #{attr}='#{value}'" }
if block_given?
@out << '>'
content = yield
if content
@out << content.to_s
end
@out << "</#{tagname}>"
else
@out << '/>'
end
nil # Tags output themselves, so they don't return any content
end
# The code below is what changes this from an ordinary class into a DSL.
# First: any unknown method is treated as the name of a tag.
alias method_missing tag
# Second: run a block in a new instance of the class.
def self.generate(out, &block)
XML.new(out).instance_eval(&block)
end
end
The XML class of is helpful for generating well-formed XML, but it does no error checking to ensure that the output is
valid according to any particular XML grammar. A better way would be to define what elements are appropriate.
class HTMLForm < XMLGrammar
element :form, :action => REQ,
:method => "GET",
:enctype => "application/x-www-form-urlencoded",
:name => OPT
element :input, :type => "text", :name => OPT, :value => OPT,
:maxlength => OPT, :size => OPT, :src => OPT,
:checked => BOOL, :disabled => BOOL, :readonly => BOOL
element :textarea, :rows => REQ, :cols => REQ, :name => OPT,
:disabled => BOOL, :readonly => BOOL
element :button, :name => OPT, :value => OPT,
:type => "submit", :disabled => OPT
end
How to use it:
HTMLForm.generate(STDOUT) do
comment "This is a simple HTML form"
form :name => "registration",
:action => "http://www.example.com/register.cgi" do
content "Name:"
input :name => "name"
content "Address:"
textarea :name => "address", :rows=>6, :cols=>40 do
"Please enter your mailing address here"
end
button { "Submit" }
end
end
The implementation:
class XMLGrammar
# Create an instance of this class, specifying a stream or object to
# hold the output. This can be any object that responds to <<(String).
def initialize(out)
@out = out # Remember where to send our output
end
# Invoke the block in an instance that outputs to the specified stream.
def self.generate(out, &block)
new(out).instance_eval(&block)
end
# Define an allowed element (or tag) in the grammar.
# This class method is the grammar-specification DSL
# and defines the methods that constitute the XML-output DSL.
def self.element(tagname, attributes={})
@allowed_attributes ||= {}
@allowed_attributes[tagname] = attributes
class_eval %Q{
def #{tagname}(attributes={}, &block)
tag(:#{tagname},attributes,&block)
end
}
end
# These are constants used when defining attribute values.
OPT = :opt # for optional attributes
REQ = :req # for required attributes
BOOL = :bool # for attributes whose value is their own name
def self.allowed_attributes
@allowed_attributes
end
# Output the specified object as CDATA, return nil.
def content(text)
@out << text.to_s
nil
end
# Output the specified object as a comment, return nil.
def comment(text)
@out << "<!-- #{text} -->"
nil
end
# Output a tag with the specified name and attribute.
# If there is a block, invoke it to output or return content.
# Return nil.
def tag(tagname, attributes={})
# Output the tag name
@out << "<#{tagname}"
# Get the allowed attributes for this tag.
allowed = self.class.allowed_attributes[tagname]
# First, make sure that each of the attributes is allowed.
# Assuming they are allowed, output all of the specified ones.
attributes.each_pair do |key,value|
raise "unknown attribute: #{key}" unless allowed.include?(key)
@out << " #{key}='#{value}'"
end
# Now look through the allowed attributes, checking for
# required attributes that were omitted and for attributes with
# default values that we can output.
allowed.each_pair do |key,value|
# If this attribute was already output, do nothing.
next if attributes.has_key? key
if (value == REQ)
raise "required attribute '#{key}' missing in <#{tagname}>"
elsif value.is_a? String
@out << " #{key}='#{value}'"
end
end
if block_given?
# This block has content
@out << '>' # End the opening tag
content = yield # Invoke the block to output or return content
if content # If any content returned
@out << content.to_s # Output it as a string
end
@out << "</#{tagname}>" # Close the tag
else
# Otherwise, this is an empty tag, so just close it.
@out << '/>'
end
nil # Tags output themselves, so they don't return any content.
end
end
Ruby I/O
To obtain a list of files that match a given pattern, use the Dir.[] operator. The pattern is not a regular
expression, but is like the file-matching patterns used in shells. “?” matches a single character. “*” matches any
number of characters. And “**” matches any number of directory levels. Characters in square brackets are alternatives,
as in regular expression.
dir['*.data'] # Files with the "data" extension
Dir['?'] # Any single-character filename
Dir['*.[ch]'] # Any file that ends with .c or .h
Dir['*.{java,rb}'] # Any file that ends with .java or .rb
Dir['*/*.rb'] # Any Ruby program in any direct sub-directory
Dir['**/*.rb'] # Any Ruby program in any descendant directory
puts Dir.getwd # Print current working directory
Dir.chdir("..") # Change CWD to the parent directory
Dir.chdir("../sibling") # Change again to a sibling directory
Dir.chdir("/home") # Change to an absolute directory
# Get the names of all files in the config/ directory
filenames = Dir.entries("config") # Get names as an array
Dir.foreach("config") {|filename| ... } # Iterate names
File.open("log.txt", "a") do |log| # Open for appending
log.puts("INFO: Logging a message") # Output to the file
end
The Kernel method open works like File.open but is more flexible. If the filename begins with |, it is treated as an
operating system command, and the returned stream is used for reading from and writing to that command process. This
is platform-dependent, of course:
# How long has the server been up?
uptime = open("|uptime") {|f| f.gets }
If the open-uri library has been loaded, then open can also be used to read from http and ftp URLs as if they were
files:
require "open-uri" # Required library
f = open("http://malinstehn.se/") # Webpage as a file
webpage = f.read # Read it as one big string
f.close
Another way to obtain an IO object is to use the stringio library to read from or write to a string:
require "stringio"
input = StringIO.open("now is the time") # Read from this string
buffer = ""
output = StringIO.open(buffer, "w") # Write into buffer
The StringIO class is not a subclass of IO, but it defines many of the same methods as IO does, and duck typing
usually allows us to use a StringIO object in place of an IO object.
Ruby predefines a number of streams that can be used without being created or opened. The global constants STDIN,
STDOUT, and STDERR are the standard input stream, the standard output stream, and the standard error stream,
respectively. By default, these streams are connected to the user’s console or a terminal window of some sort.
The global variables $stdin, $stdout, and $stderr are initially set to the same values as the stream constants. Global
functions like print and puts write to $stdout by default. If a script alters the value of this global variable, it
will change the behavior of those methods. The true “standard output” will still be available through STDOUT, however.
Here follows scripts to show how it works:
#!/usr/bin/ruby
# file: readline.rb
print "Enter your name: "
name = gets # In fact $stdin.gets
puts "Hello #{name}" # In fact $stdout.puts
$ ./readline.rb # running the script
Enter your name: Patrik
Hello Patrik
Another predefined stream is ARGF, or $*. This stream has special behavior intended to make it simple to write scripts
that read the files specified on the command line or from standard input.
#!/usr/bin/ruby
# outputargs.rb
puts ARGS
a = Array.new($*)
puts a.to_s
$ ./outputargs.rb hej hopp
hej
hopp
hejhopp
In Ruby 1.9, every stream can have two encodings associated with it. These are known as the external and internal
encodings, and are returned by the external_encoding and internal_encoding methods of an IO object. The external
encoding is the encoding of the text as stored in the file. The internal encoding is the encoding used to represent
the text within Ruby. Specify the encoding of any IO object (including pipes and network sockets) with the
set_encoding method. With two arguments, it specifies an external encoding and an internal encoding. If the external
encoding is also the desired internal encoding, there is no need to specify an internal encoding. If, on the other
hand, you’d like the internal representation of the text to be different than the external representation, you can
specify an internal encoding and Ruby will transcode from the external to the internal when reading and to the
external when writing.
f.set_encoding("iso-8859-1", "utf-8") # Latin-1 (external), transcoded to UTF-8 (internal)
in = File.open("data.txt", "r:utf-8"); # Read UTF-8 text
out = File.open("log", "a:utf-8"); # Write UTF-8 text
If you specify no encoding at all, then Ruby defaults to the default external encoding when reading from files, and
defaults to no encoding (i.e., the ASCII-8BIT/ BINARY encoding) when writing to files or when reading or writing from
pipes and sockets.
IO defines a number of ways to read lines from a stream:
lines = ARGF.readlines # Read all input, return an array of lines
line = DATA.readline # Read one line from stream
print l while l = DATA.gets # Read until gets returns nil, at EOF
DATA.each {|line| print line } # Iterate lines from stream until EOF
The readline and the gets method differ only in their handling of EOF (end-of-file: the condition that occurs when
there is no more to read from a stream). gets returns nil if it is invoked on a stream at EOF. readline instead raises
an EOFError. You can check whether a stream is already at EOF with the eof? method. The lines returned by these
methods include the line terminator (although the last line in a file may not have one). Use String.chomp! to strip it
off. The special global variable $/ holds the line terminator. You can set $/ to alter the default behavior of all the
line-reading methods, or you can simply pass an alternate separator to any of the methods (including the each
iterator). You might do this when reading comma-separated fields from a file, for example, or when reading a binary
file that has some kind of “record separator” character. There are two special cases for the line terminator. If you
specify nil, then the line-reading methods keep reading until EOF and return the entire contents of the stream as a
single line. If you specify the empty string “” as the line terminator, then the line-reading methods read a paragraph
at a time, looking for a blank line as the separator.
The STDOUT and STDERR streams are writable, as are files opened in any mode other than "r" or "rb".
o = STDOUT
o.putc("B") # Write single byte 66 (capital B)
o.putc("CD") # Write just the first byte of the string (C)
o << x # Output x.to_s
o << x << y # May be chained: output x.to_s + y.to_s
o.print s # Output s.to_s + $\
o.puts x # Output x.to_s.chomp plus newline
o.puts x,y # Output x.to_s.chomp, newline, y.to_s.chomp, newline
If the output record separator $/ has been changed from its default value of nil, then that value is output after all
arguments are printed.
When you are done reading from or writing to a stream, you must close it with the close method. This flushes any
buffered input or output, and also frees up operating system resources. A number of stream-opening methods allow you
to associate a block with them. They pass the open stream to the block, and automatically close the stream when the
block exits. Managing streams in this way ensures that they are properly closed even when exceptions are raised:
File.open("test.txt") do |f|
# Use stream f here
end
Ruby’s output methods (except syswrite) buffer output for efficiency. The output buffer is flushed at reasonable
times, such as when a newline is output or when data is read from a corresponding input stream. There are times,
however, when you may need to explicitly flush the output buffer to force output to be sent right away:
#!/usr/bin/ruby
out = STDOUT
out.print 'wait>' # Display a prompt
out.flush # Manually flush output buffer to OS
sleep(1) # Prompt appears before we go to sleep
You can decide the behaviour of whether you want ruby to automatically flush the buffer after every write or if you
want to control it.
out.sync = true # Automatically flush buffer after every write
out.sync = false # Don't automatically flush
out.sync # return mode.
IO defines several predicates for testing the state of a stream:
f.eof? # true if stream is at EOF
f.closed? # true if stream has been closed
f.tty? # true if stream is interactive
The only one of these methods that needs explanation is tty?. This method, and its alias isatty (with no question
mark), returns true if the stream is connected to an interactive device such as a terminal window or a keyboard with
(presumably) a human at it.
Networking
At the lowest level, networking is accomplished with sockets, which are a kind of IO object. Once you have a socket
opened, you can read data from, or write data to, another computer just as if you were reading from or writing to a
file. Internet clients use the TCPSocket class, and Internet servers use the TCPServer class (also a socket). All
socket classes are part of the standard library, so to use them in your Ruby program, you must first write:
require 'socket'
To write Internet client applications, use the TCPSocket class. Obtain a TCPSocket instance with the TCPSocket.open
class method, or with its synonym TCPSocket.new.
#!/usr/bin/ruby
# simpleclient.rb
require 'socket' # Sockets are in standard library
host, port = ARGV # Host and port from command line
s = TCPSocket.open(host, port) # Open a socket to host and port
while line = s.gets # Read lines from the socket
puts line.chop # And print with platform line terminator
end
s.close # Close the socket when done
Like File.open, the TCPSocket.open method can be invoked with a block. In that form, it passes the open socket to the
block and automatically closes the socket when the block returns.
To write Internet servers, we use the TCPServer class. In essence, a TCPServer object is a factory for TCPSocket
objects. Call TCPServer.open to specify a port for your service and create a TCPServer object.
#!/usr/bin/ruby
# simpleserver.rb
require 'socket' # Get sockets from stdlib
server = TCPServer.open(2000) # Socket to listen on port 2000
loop { # Infinite loop: servers run forever
client = server.accept # Wait for a client to connect
client.puts(Time.now.ctime) # Send the time to the client
client.close # Disconnect from the client
}
Now you can test your server and client by opening up two terminals
$ ./simpleserver.rb # in first terminal
$ ./simpleclient.rb localhost 2000 # in second terminal
Output:
Sun Oct 13 16:46:40 2013
A lower-overhead alternative is to use UDP datagrams, with the UDPSocket class. UDP allows computers to send
individual packets of data to other computers, without the overhead of establishing a persistent connection.
require 'socket'
host, port, request = ARGV # Get args from command line
ds = UDPSocket.new # Create datagram socket
ds.connect(host, port) # Connect to the port on the host
ds.send(request, 0) # Send the request text
response,address = ds.recvfrom(1024) # Wait for a response (1kb max)
puts response # Print the response
The second argument to the send method specifies flags. It is required, even though we are not setting any flags.
The argument to recvfrom specifies the maximum amount of data we are interested in receiving. In this case, we limit
our client and server to transferring 1 kilobyte.
The server code uses the UDPSocket class just as the client code does
require 'socket' # Standard library
port = ARGV[0] # The port to listen on
ds = UDPSocket.new # Create new socket
ds.bind(nil, port) # Make it listen on the port
loop do # Loop forever
request,address=ds.recvfrom(1024) # Wait to receive something
response = request.upcase # Convert request text to uppercase
clientaddr = address[3] # What ip address sent the request?
clientname = address[2] # What is the host name?
clientport = address[1] # What port was it sent from
ds.send(response, 0, # Send the response back...
clientaddr, clientport) # ...where it came from
# Log the client connection
puts "Connection from: #{clientname} #{clientaddr} #{clientport}"
end
Instead of calling connect to connect the socket, our server calls bind to tell the socket what port to listen on.
The server then uses send and recvfrom, just as the client does, but in the opposite order. It calls recvfrom to
wait until it receives a datagram on the specified port.
The following code is a more fully developed Internet client in the style of telnet. It connects to the specified
host and port and then loops, reading a line of input from the console, sending it to the server, and then reading
and printing the server’s response.
require 'socket'
host, port = ARGV # Network host and port on command line
begin # Begin for exception handling
# Give the user some feedback while connecting.
STDOUT.print "Connecting..." # Say what we're doing
STDOUT.flush # Make it visible right away
s = TCPSocket.open(host, port) # Connect
STDOUT.puts "done"
# Now display information about the connection.
local, peer = s.addr, s.peeraddr
STDOUT.print "Connected to #{peer[2]}:#{peer[1]}"
STDOUT.puts " using local port #{local[1]}"
# Wait just a bit, to see if the server sends any initial message.
begin
sleep(0.5)
msg = s.read_nonblock(4096) # Non blocking way of reading up to 4096 bytes
STDOUT.puts msg.chop
rescue SystemCallError
# If nothing was ready to read, just ignore the exception.
end
# Now begin a loop of client/server interaction.
loop do
STDOUT.print '> ' # Display prompt for local input
STDOUT.flush
local = STDIN.gets # Read line from the console
break if !local # Quit if no input from console
s.puts(local) # Send the line to the server
s.flush
# Read the server's response and print out.
# The server may send more than one line, so use readpartial
# to read whatever it sends (as long as it all arrives in one chunk).
response = s.readpartial(4096)
puts(response.chop)
end
rescue
puts $!
ensure
s.close if s # Don't forget to close the socket
end
The simple time server shown earlier in this section never maintained a connection with any client,it would simply
tell the client the time and disconnect. Many more sophisticated servers maintain a connection, and in order to be
useful, they must allow multiple clients to connect and interact at the same time. One way to do this is with
threads—each client runs in its own thread. The alternative is to write a multiplexing server using the Kernel.
select method. The return value of select is an array of arrays of IO objects. The first element of the array is the
array of streams (sockets, in this case) that have data to be read (or a connection to be accepted). The example
server is trivial—it simply reverses each line of client input and sends it back, if the client sends quit it stops
the service.
require 'socket'
server = TCPServer.open(2000)
sockets = [server] # An array of sockets we'll monitor
log = STDOUT # Send log messages to standard out
while true # Servers loop forever
ready = select(sockets) # Wait for a socket to be ready
readable = ready[0] # These sockets are readable
readable.each do |socket|
if socket == server # If the server socket is ready
client = server.accept # Accept a new client
sockets << client # Add it to the set of sockets
# Tell the client what and where it has connected.
client.puts "Reversal service v0.01 running on #{Socket.gethostname}"
# And log the fact that the client connected
log.puts "Accepted connection from #{client.peeraddr[2]}"
else # Otherwise, a client is ready
input = socket.gets # Read input from the client
# If no input, the client has disconnected
if !input
log.puts "Client on #{socket.peeraddr[2]} disconnected."
sockets.delete(socket) # Stop monitoring this socket
socket.close # Close it
next # And go on to the next
end
input.chop!
if (input == "quit") # If the client asks to quit
socket.puts("Bye!");
log.puts "Closing connection to #{socket.peeraddr[2]}"
sockets.delete(socket) # Stop monitoring the socket
socket.close # Terminate the session
else # Otherwise, client is not quitting
socket.puts(input.reverse) # So reverse input and send it back
end
end
end
end
Here is a example of a multiplex server but using threads
require 'socket'
# This method expects a socket connected to a client.
# It reads lines from the client, reverses them and sends them back.
# Multiple threads may run this method at the same time.
def handle_client(c)
while true
input = c.gets.chop # Read a line of input from the client
break if !input # Exit if no more input
break if input=="quit" # or if the client asks to.
c.puts(input.reverse) # Otherwise, respond to client.
c.flush # Force our output out
end
c.close # Close the client socket
end
server = TCPServer.open(2000) # Listen on port 2000
while true # Servers loop forever
client = server.accept # Wait for a client to connect
Thread.start(client) do |c| # Start a new thread
handle_client(c) # And handle the client on that thread
end
end
We can use the socket library to implement any Internet protocol. Here, for example, is code to fetch the content
of a web page:
require 'socket'
host = 'www.example.com' # The web server
port = 80 # Default HTTP port
path = "/index.html" # The file we want
# This is the HTTP request we send to fetch a file
request = "GET #{path} HTTP/1.0\r\n\r\n"
socket = TCPSocket.open(host,port) # Connect to server
socket.print(request) # Send request
response = socket.read # Read complete response
# Split response at first blank line into headers and body
headers,body = response.split("\r\n\r\n", 2)
print body
You might prefer to use a prebuilt library like Net::HTTP for working with HTTP.
require 'net/http' # The library we need
host = 'www.example.com'
path = '/index.html'
http = Net::HTTP.new(host)
headers, body = http.get(path) # Request the file
if headers.code == "200" # Check the status code
# NOTE: code is not a number!
print body # Print body if we got it
else
puts "#{headers.code} #{headers.message}" # Display error message
end
Finally, recall that the open-uri library described earlier in the chapter makes fetching a web page even easier:
require 'open-uri'
open("http://www.example.com/index.html") {|f|
puts f.read
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment