when people say "integration testing", the feeling I get is that the definition that most people are using is "unit tests that happen to perform i/o". is this the definition that most people are using?
there's another definition, which is the definition I learned when I first learned about unit testing, which I have never seen anyone actually use: a unit test is an individual unit of testing, and "integration testing" is when you sequence the unit tests to create an integrated suite of tests. that is ... integration testing is when you integrate your unit tests, not when you test how your system integrates with another system. Those are distinct concepts! My suggestion here is not that the latter concept isn't valuable, it is valuable, it's just distinct, and rarely do I see the first concept being executed well.
For example, let's say you were testing some CRUD API and you wanted to test two things: the create and the update. The strategy that I most commonly witness is as follows:
- create a unit test for your
create
action. Start with a fixed, known state (let's call itc0
), then run thecreate
. The system is in some new statec1
. Check the response to thecreate
routine, as well as check thatc1
is the value of the state that you expect. - independently, create a unit test for your
update
action. Start with some known-good state (let's call itu0
), a state of an existing object in a database. Creating this state is itself work: it's new work to create this platonic starting state. Run yourupdate
against this platonic state (u0
), producing some new state (u1
). Check the response to yourupdate
action and check thatu1
is the new state value that you expect.
That's all well and good, but you've now created a handful of new problems:
- how do you define the success criteria of the
create
action (that is, the verificationp
such thatp(c1)
indicates that the test for create passes) that is not in terms of theread
action, in order to guarantee isolation of the things under test? What value is provided by testing thecreate
action alone? Does this not create a new hazard where the verification logic of thecreate
test can diverge from the actual logic of theread
action? - how do you define the initial state for the
update
test? (in this example,u0
.) Is that not simply the result of thecreate
action? Theupdate
action is now being tested off of a platonic starting state. How do you know that this platonic starting state is reachable by your system? Is it not the case thatc1
, the output of thecreate
test andu0
, the input of the update test, should always be equal or your tests are invalid? If that state is reachable now, how do you ensure that it continues to be reachable as your system changes? - if your
update
is tested off of a platonic starting state that is not the exact output of the create action, you now have two problems: yourupdate
is not testing the state reached by thecreate
routine, and you've created a new, false requirement that theupdate
action be usable against a state that is not reachable by your system. You had to go through all of the trouble of creating this state, which is new work, when thecreate
action ... literally does that work. The value provided by the isolation has to be significantly greater than the cost of having created that state, otherwise you're just creating busywork.
anyway, this comes up a lot for me since my primary project is a stateful multiplayer server whose only job is to contain and communicate the state of a game. integration testing this thing is ... hard. curious what people do for integration testing from a conceptual level, not from like a tools/language/library level. do other people also face the problem I'm facing, or are people finding testing against platonic states relatively unproblematic and it sounds more like I'm doing it wrong?
oh man am I glad to hear from you.
Ah! yeah, this is a good example. You test that you got what you did want, but forgot to test that nothing else happened, and so maybe you had unintended side-effects. I've definitely had this problem before.
If I'm understanding correctly, you're basically creating a sort of proxy value of the system state and testing against that, a sort of neutral frame of reference that is more simply described than the entire state of the system; so long as things look correct from that frame of reference, we're ok, we just test the validity of the frame of reference separately. That makes sense to me. I think that's a pretty good and general solution to this category of problem. It's also probably data you already want anyway because that aggregate state data is directly observable as your observability metrics for a lot of projects.
For my project, one of the big challenges is that we have domain situations where we might have a sequence of many stateful steps. So a challenging situation that I had to test for a multiplayer game was:
This was really hard to test because each of the steps relies on the state created prior, so going in full-isolation mode, the process of creating the starting state for the last step wound up being a lot of work, especially since you have to model out whether a client is receiving notification of another client's activity. Eventually modeling the beginning states started to become so complex that the complexity of creating these initial state values and maintaining them over time was so complicated that people weren't writing tests. Also everything is in-memory so I can run hundreds (thousands?) of tests in under a second because it's Go, so any perceived performance benefit of test isolation was irrelevant.
yeahhh. We definitely had that problem in the past: well-isolated tests that weren't catching bugs.
I wound up writing a testing library for writing dependent tests: tests that depend on the "output" of other tests: https://github.com/jordanorelli/tea/ (the incr example is probably the most straightforward to understand)
I know it looks abandoned but ... I'm actually using it in production now; I have a graph of ~600 tests that block CI for deploying our multiplayer server, and that suite of tests creates http servers and establishes websockets and tests that when one client does something, other clients do (or do not) see those things. So I'm starting and stopping hundreds of http servers and creating hundreds of websocket connections. I'm hitting ulimit problems now because many tests involve at least 3 connections (the game and two players) so I'm opening over a thousand websockets when I run
go test
.It works by creating a tree of tests, where each test is a value defined by the following interface:
so like if you had three test types named
startAPIServer
that tested that the API server starts and is reachable,createBook
that used some create endpoint to create a book record, andgetBook
that used the API endpoint to get the record back:The tests use Go sub-tests, so you get tree output using just
go test
with no additional tooling. If a test fails, all of its subtests are still printed in the tree but marked as skipped, so that when a test fails you know how many dependent tests were skipped. It uses struct tags to pass state data from one test to the next in a given sequence. Running a given test means re-running every test back to the root of the tree, so you write tests that ergonomically look like dependent tests, but they actually execute as isolated tests.Writing this library dramatically increased our test coverage and test productivity. I write the test types, and any developer can create new test cases and new test sequences very easily by just composing the types that I've defined. But now I have a new problem: maintaining the test library is work, and its output is very bad and very confusing. I'm at a point where I have to work on the test library itself, but I'm not sure if this is just a very silly concept and I should throw it out or if I should double-down on it.
But it's interesting that you're not just like, throwing down a testing library that you think solves this problem elegantly and instead describing how to hand-roll this because the entire category is a bit dicey. Everyone I've talked to about this describes a different ad-hoc way of dealing with it.