Skip to content

Instantly share code, notes, and snippets.

@dbrady
Created May 26, 2011 06:05
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dbrady/992630 to your computer and use it in GitHub Desktop.
Save dbrady/992630 to your computer and use it in GitHub Desktop.
Unit Tests for CSV demonstrating that Enumerable is supported only at the whim of the underlying StringIO seek position
#!/usr/bin/env ruby
unless RUBY_VERSION.start_with? "1.9"
puts "This test file is for Ruby 1.9.x"
exit -1
end
require 'csv'
require 'test/unit'
# Hi James!
# This is written with love and respect. I'm a little cranky with this
# code at the moment, but I think once you see what I'm seeing you'll
# probably agree.
# 0. This is maddening. I just went to pull up the docco that I was
# working off of today, and I can't find it. I just found THIS page,
# which DOES have the examples I was hoping to find:
#
# http://www.ruby-doc.org/ruby-1.9/index.html
#
# Hang on, lemme go crack open the work laptop. Oh, wild. I was
# working entirely off of the CSV::Table documentation, because that's
# where Google led me.
# 1. I didn't get that CSV was a file I/O based metaphor from its
# documentation at all. That's because it doesn't HAVE much
# documentation. This thing NEEDS some examples desperately. Stack
# Overflow has a few examples of people saying "is there a simple
# example of how to use CSV?" It would be REALLY helpful if those
# simple examples were in the rdoc.
# 2. Note that my use case is probably simple and edge-casey. I'm not
# streaming data. I'm grabbing a CSV file wholesale, reading it in one
# go, and then pushing it out to an HTML table--that isn't allowed to
# get friendly with the report's native data format, so NO, it can't
# just use CSV's iteration helpers. :-)
# 3. I TOTALLY expected that CSV would uphold the Enumerable contract,
# because it says "Modules included: Enumerable" right on the tin.
# dbrady's not so very humble opinion: this cost me an hour of
# debugging time late in the afternoon of a deploy day. If you don't
# want to change it because I'm just using it wrong, PLEASE document
# this. I read your existing documentation in good faith and with a
# careful eye and was led down the garden path. (See point 0 above,
# though--I didn't know there was a CSV page apart from CSV::Table. I
# blame the crappy 4-iframe bullshit indexing of RDoc, that forces me
# to use Google to search the damn documentation.)
# Then again, if you're not shocked and appalled by the tests that
# follow, what I *really* want to know is who are you and what have
# you done with James Edward Gray II?
# The tests that follow are in two groups. test_workaround_* tests DO
# pass, but demonstrate the need to completely reach up CSV's butt and
# violate its encapsulation rather vigorously. All other tests fail,
# demonstrating the problem that CSV presents an Enumerable and/or
# random-access interface that is completely dependent on the StringIO
# seek position.
class TestCsvSeekDependencies < Test::Unit::TestCase
def setup
@csv = CSV.new(<<PETS, :headers => true)
"name","type","age"
Noni,"Border Collie/Golden Retriever Mix",8
BamBam,Cat,2
Pebbles,Cat,2
PETS
end
def test_sanity
assert true, "This really should have been true."
end
def test_headers_do_not_depend_on_first_fetch
assert_equal ["name", "type", "age"], @csv.headers
end
def test_workaround_headers_do_not_depend_on_first_fetch
@csv.count
assert_equal ["name", "type", "age"], @csv.headers
end
def test_headers_does_not_get_erased_by_rewind
@csv.count
assert_equal ["name", "type", "age"], @csv.headers
@csv.rewind
assert_equal ["name", "type", "age"], @csv.headers
end
def test_count_does_not_depend_on_seek_position
assert_equal 3, @csv.count
assert_equal 3, @csv.count
end
def test_workaround_count_does_not_depend_on_seek_position
assert_equal 3, @csv.count
@csv.rewind
assert_equal 3, @csv.count
@csv.rewind
assert_equal 3, @csv.count
end
def test_first_does_not_depend_on_seek_position
assert_equal "Noni", @csv.first["name"]
assert_equal "Noni", @csv.first["name"]
end
def test_workaround_first_does_not_depend_on_seek_position
assert_equal "Noni", @csv.first["name"]
@csv.rewind
assert_equal "Noni", @csv.first["name"]
end
def test_first_does_not_depend_on_count_not_having_been_called_previously
assert_equal "Noni", @csv.first["name"]
assert_equal 3, @csv.count
end
def test_workaround_first_does_not_depend_on_count_not_having_been_called_previously
assert_equal "Noni", @csv.first["name"]
@csv.rewind
assert_equal 3, @csv.count
end
def test_each_should_not_depend_on_each_not_having_been_called
names = []
@csv.each do |row|
names << row["name"]
end
assert_equal ["Noni", "BamBam", "Pebbles"], names
names = []
@csv.each do |row|
names << row["name"]
end
assert_equal ["Noni", "BamBam", "Pebbles"], names
end
def test_each_should_not_depend_on_first_not_having_been_called
names = []
@csv.first
@csv.each do |row|
names << row["name"]
end
assert_equal ["Noni", "BamBam", "Pebbles"], names
end
def test_map_should_not_depend_on_geez_we_get_it_already
names = @csv.map {|row| row["name"] }
assert_equal ["Noni", "BamBam", "Pebbles"], names
names = @csv.map {|row| row["name"] }
assert_equal ["Noni", "BamBam", "Pebbles"], names
end
def test_anyp_should_not_depend_on_well_you_see_the_problem_now
assert_equal true, @csv.any? {|row| row["name"] == "BamBam"}
# next line passes, because seek pos is left at line 2
assert_equal true, @csv.any? {|row| row["name"] == "Pebbles"}
# fails because seek pos is now at EOF
assert_equal true, @csv.any? {|row| row["name"] == "Noni"}
end
def test_etc_and_so_forth_for_all_of_enumerable
@csv.count
assert_equal [["name", "BamBam"], ["type", "Cat"], ["age", "2"]], @csv.find {|row| row["name"] == "BamBam"}.to_a
assert_equal 12, @csv.inject(0) {|a,b| a + b["age"].to_i}
assert_equal [["name", "Pebbles"], ["type", "Cat"], ["age", "2"]], @csv.drop(2)[0].to_a
# ...etc...
end
def test_if_youre_gonna_play_in_texas_you_gotta_have_grep_in_the_band
# this would ALSO be dependent on seek position if it wasn't broken
assert_not_equal [], @csv.grep(/BamBam/)
end
end
# So... there you go. I hope you find this useful, and I *REALLY* hope
# you'll continue to support Enumerable in CSV. By which I mean start
# supporting. *cough* ;-)
# It may not be possible to leave the seek position alone while
# guaranteeing that Enumerable functions like map() and first() work
# correctly. You may well need to split Enumerable out of CSV
# altogether. If you do, like I said above, PLEASE provide a way to
# permit it, even if it means casting the whole file to_a or similar.
# Cf. ruby's IO#read_lines() method that just gives you back an array.
# Thanks for all your hard work! I'm grumpy because of time lost and
# the mega schedule stress it induced. I am NOT grumpy that you
# created this library and put it out there. Which is why I'm giving
# you unit tests rather than a complaint form. :-)
@JEG2
Copy link

JEG2 commented May 26, 2011

Thank you for the super awesome bug report. :)

I'm not sure I agree it's a secret that CSV has an IO like interface. Here are multiple examples from the documentation:

# === Interface
# ...
# * CSV::open() is now more like Ruby's open().
# * CSV objects now support most standard IO methods.

# :call-seq:
#   open( filename, mode = "rb", options = Hash.new ) { |faster_csv| ... }
#   open( filename, options = Hash.new ) { |faster_csv| ... }
#   open( filename, mode = "rb", options = Hash.new )
#   open( filename, options = Hash.new )
#
# This method opens an IO object, and wraps that with CSV.  This is intended
# as the primary interface for writing a CSV file.

# This constructor will wrap either a String or IO object passed in +data+ for
# reading and/or writing.  In addition to the CSV instance methods, several IO
# methods are delegated.  (See CSV::open() for a complete list.)  If you pass
# a String for +data+, you can later retrieve it (after writing to it, for
# example) with CSV.string().

The documentation does also include examples of usage, including a non-iterator example:

# == Reading
#
# === From a File
#
# ==== A Line at a Time
#
#   CSV.foreach("path/to/file.csv") do |row|
#     # use row here...
#   end
#
# ==== All at Once
#
#   arr_of_arrs = CSV.read("path/to/file.csv")
#
# === From a String
#
# ==== A Line at a Time
#
#   CSV.parse("CSV,data,String") do |row|
#     # use row here...
#   end
#
# ==== All at Once
#
#   arr_of_arrs = CSV.parse("CSV,data,String")
#
# == Writing
#
# === To a File
#
#   CSV.open("path/to/file.csv", "wb") do |csv|
#     csv << ["row", "of", "CSV", "data"]
#     csv << ["another", "row"]
#     # ...
#   end
#
# === To a String
#
#   csv_string = CSV.generate do |csv|
#     csv << ["row", "of", "CSV", "data"]
#     csv << ["another", "row"]
#     # ...
#   end
#
# == Convert a Single Line
#
#   csv_string = ["CSV", "data"].to_csv   # to CSV
#   csv_array  = "CSV,String".parse_csv   # from CSV
#
# == Shortcut Interface
#
#   CSV             { |csv_out| csv_out << %w{my data here} }  # to $stdout
#   CSV(csv = "")   { |csv_str| csv_str << %w{my data here} }  # to a String
#   CSV($stderr)    { |csv_err| csv_err << %w{my data here} }  # to $stderr
#   CSV($stdin)     { |csv_in|  csv_in.each { |row| p row } }  # from $stdin
#
# == Advanced Usage
#
# === Wrap an IO Object
#
#   csv = CSV.new(io, options)
#   # ... read (with gets() or each()) from and write (with <<) to csv here ...

I'm totally willing to add more examples that are needed. Please send patches.

For the final stop on our documentation tour, the surprising behavior of headers() is documented:

# Returns +nil+ if headers will not be used, +true+ if they will but have not
# yet been read, or the actual headers after they have been read.  See
# CSV::new for details.

Also, you asked for support of CSV#to_a() and/or the IO interface CSV::read()/CSV::readlines(). All three of those methods are available. There's also a very nice CSV::table() shortcut wrapper for these slurp interfaces.

Finally, I don't agree that grep() is broken. Remember that you are grep()ping over rows, not fields.

Summary of the above: I'm trying very hard to meet your needs, Dude! :)

All of that said, you did locate a problem that I think needs fixing. each() needs updates. At the minimum, CSV#each() needs to support returning 1.9's Enumerators.

The other question is, what is the correct stream-like behavior for the IO iterators? I see the two options as continuing where the stream left off or calling rewind() to reset after iteration. Let's see what File does:

open("data.txt", "w+") do |f|
  f.puts %w[one two three four five]

  f.rewind
  puts "Testing first()..."
  3.times do
    puts f.first
  end
  puts

  f.rewind
  puts "Testing find()..."
  puts f.find { |l| l =~ /\At/ }
  puts "Show where we are now..."
  puts f.to_a
end

# Testing first()...
# one
# two
# three
# 
# Testing find()...
# two
# Show where we are now...
# three
# four
# five

It looks to me like it holds the seek position, which feels like the correct short-curcuiting behavior to me. Given that, I think CSV handles this correctly, but I'm open to a debate if people disagree. Please let me know.

As an aside, the switch to officially supporting the IO interface was one of the primary changes from the old CSV code to FasterCSV and later the new CSV. It really bothered me that the old CSV code used the IO methods, but didn't really honor their contract. In my opinion, it's a great interface for this domain. We need to remember that CSV data, like any stream, can be very large. We need to default to handling the data as it comes in.

@JEG2
Copy link

JEG2 commented May 26, 2011

I've pushed support for Enumerator. Thanks!

@dbrady
Copy link
Author

dbrady commented May 27, 2011 via email

@JEG2
Copy link

JEG2 commented May 27, 2011 via email

@dbrady
Copy link
Author

dbrady commented May 27, 2011 via email

@JEG2
Copy link

JEG2 commented May 27, 2011 via email

@dbrady
Copy link
Author

dbrady commented May 27, 2011 via email

@JEG2
Copy link

JEG2 commented May 27, 2011 via email

@dbrady
Copy link
Author

dbrady commented May 27, 2011 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment