Skip to content

Instantly share code, notes, and snippets.

@josh-works
Last active March 17, 2017 19:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save josh-works/9a386cd033dec833b65257af6eaa4fbc to your computer and use it in GitHub Desktop.
Save josh-works/9a386cd033dec833b65257af6eaa4fbc to your computer and use it in GitHub Desktop.

if you're here from www.josh.works, here's the gist Jason sent back to me the next day, with his answers:

Jason's answers He said:

From then on, you duplicate that class variable to an instance variable. The .dup should "protect" you from one test modifying data for another test, which is a concern with class variables.

Hey Jason!

I've got a sticky question for you. The answer might be a simple "that's not possible", but I'm not positive.

I'll give you the context, then the code, then outline all of the things I've tried.

The Context

tl;dr: Every test file I run re-initializes our sales_engine with all of its many thousands of lines of data. I want to figure out how to use a setup method or module to spin up a SINGLE instance of the sales engine, and run the rest of our tests against that.

Brett Schwartz and I are building out our Black Thursday project. He and I about done with iteration 3.

It's actually been quite smooth sailing so far. We've got all our tests passing and the spec harness is quite happy. github repo

Before we started, the instructors talked a lot about making fixtures, or sample data to save us time on running our tests.

I made a bunch, but then ran into problems, as taking pieces of data from every file doesn't guarantee it's the RIGHT data. We had lots of method calls coming back empty, because it was doing math or logic on a file that originally had 5000 items, and now had just ten.

So, we canned using fixures, and decided to run with the full data sets.

Our tests were running slow, originally, because for every test in every file, we were initializing a new sales engine repo.

The Code

Here's what each test looked like when they were super slow:

class MerchantRepositoryTest < Minitest::Test

  def setup
    @se = SalesEngine.from_csv({
  :items     => "./data/items.csv",
  :merchants => "./data/merchants.csv",})
  end

  def test_merchant_repository_exists
    assert_instance_of MerchantRepository, @se.merchants
  end
  .
  .
  .

I thought initializing the repo before every test was a bad idea, so we went to this:

class InvoiceRepositoryTest < Minitest::Test


  @@se = SalesEngine.from_csv({
    :invoices => "./data/invoices.csv",
    :items     => "./data/items.csv",
    :merchants => "./data/merchants.csv",
    })
  @@ir = @@se.invoices


  def setup
    @se = @@se
    @ir = @@ir
  end

  def test_it_exists
    assert_instance_of InvoiceRepository, @ir
  end
  .
  .
  .

Ruby complains that I've got class methods scattered about, so I don't think this is the right way to do it, but it saves us considerable time on testing. Tests went from ~20 seconds per file to 2 seconds per file. (the tests themselves have always completed in fractions of a second).

Then, we loaded up more data. Now it's ~4 seconds for the engine to initialize. Not the end of the world, except when I run rake unit_test... we've got 14 test files. Now, it's 14 * 4 seconds, and takes a while. The spec harness still runs quite quickly, so its not that our code is slow (though I know it has lots of room for improvement.)

In my digging around on the internet, it sounds like Rspec can do this "before any test runs, setup the following..." approach. I don't know rspec, though, and don't want to switch over to it this late on this project just for this small gain.

I eventually made one last small improvement to the tests, and pulled out the engine initialization to a module:

module TestSetup

  @@se = SalesEngine.from_csv({
    :invoices => "./data/invoices.csv",
    :items     => "./data/items.csv",
    :merchants => "./data/merchants.csv",
    :transactions => "./data/transactions.csv",
    :invoice_items => "./data/invoice_items.csv"
    })

end

# test helper, included in every test:
require 'simplecov'
SimpleCov.start

gem 'minitest'
require 'minitest/autorun'
require 'pry'
require './lib/test_module' # <= calls the TestSetup module

# test file:

class MerchantRepositoryTest < Minitest::Test
  include TestSetup

  def setup
    @se = @@se
  end

  def test_merchant_repository_exists
    assert_instance_of MerchantRepository, @se.merchants
  end
  .
  .
  .

So, each test file loads the test_helper.rb file, which allows the test file to include TestSetup and access all the test set-up in a single place.

So, this seems like a win from DRY principles, but I still don't like that I'm starting this engine a dozen times when I run my tests.

Do you know how to get around this? I've tried... many things. But don't know enough ruby to make educated guesses.

Ideas I've tried

I've tried:

  1. Memoization, to get @@se ||= [expensive_operation] and had no luck, doing this in the test helper file or in the module, with or without making it a class variable.
  2. adding/removing the setup methods from the tests, and working around them
  3. it seems like a dozen other things.

So, how would you handle this? each test finishes in ~0.01 seconds, but takes ~4 seconds to finish the setup. it takes 54 seconds to run all the tests.

According to $ time rake unit_test:

real	1m1.484s
user	0m56.605s
sys	0m1.520s

I think that means of that 1m1.4 seconds, all but 1.5 of them were spent... loading data?

Anyway, I'd love to know if you think there's a reasonable solution to this problem!

Unless you say otherwise, I'm going to let it go. I've learned a lot about metrics and benchmarking, as I've gone down this rabbit hole, but I'm reduced to just flailing around in the dark.

Things I've read in pursuit of resolving this:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment