when people say "integration testing", the feeling I get is that the definition that most people are using is "unit tests that happen to perform i/o". is this the definition that most people are using?
there's another definition, which is the definition I learned when I first learned about unit testing, which I have never seen anyone actually use: a unit test is an individual unit of testing, and "integration testing" is when you sequence the unit tests to create an integrated suite of tests. that is ... integration testing is when you integrate your unit tests, not when you test how your system integrates with another system. Those are distinct concepts! My suggestion here is not that the latter concept isn't valuable, it is valuable, it's just distinct, and rarely do I see the first concept being executed well.
For example, let's say you were testing some CRUD API and you wanted to test two things: the create and the update. The strategy that I most commonly witness is as follows:
- create a unit test for your
create
action. Start with a fixed, known state (let's call itc0
), then run thecreate
. The system is in some new statec1
. Check the response to thecreate
routine, as well as check thatc1
is the value of the state that you expect. - independently, create a unit test for your
update
action. Start with some known-good state (let's call itu0
), a state of an existing object in a database. Creating this state is itself work: it's new work to create this platonic starting state. Run yourupdate
against this platonic state (u0
), producing some new state (u1
). Check the response to yourupdate
action and check thatu1
is the new state value that you expect.
That's all well and good, but you've now created a handful of new problems:
- how do you define the success criteria of the
create
action (that is, the verificationp
such thatp(c1)
indicates that the test for create passes) that is not in terms of theread
action, in order to guarantee isolation of the things under test? What value is provided by testing thecreate
action alone? Does this not create a new hazard where the verification logic of thecreate
test can diverge from the actual logic of theread
action? - how do you define the initial state for the
update
test? (in this example,u0
.) Is that not simply the result of thecreate
action? Theupdate
action is now being tested off of a platonic starting state. How do you know that this platonic starting state is reachable by your system? Is it not the case thatc1
, the output of thecreate
test andu0
, the input of the update test, should always be equal or your tests are invalid? If that state is reachable now, how do you ensure that it continues to be reachable as your system changes? - if your
update
is tested off of a platonic starting state that is not the exact output of the create action, you now have two problems: yourupdate
is not testing the state reached by thecreate
routine, and you've created a new, false requirement that theupdate
action be usable against a state that is not reachable by your system. You had to go through all of the trouble of creating this state, which is new work, when thecreate
action ... literally does that work. The value provided by the isolation has to be significantly greater than the cost of having created that state, otherwise you're just creating busywork.
anyway, this comes up a lot for me since my primary project is a stateful multiplayer server whose only job is to contain and communicate the state of a game. integration testing this thing is ... hard. curious what people do for integration testing from a conceptual level, not from like a tools/language/library level. do other people also face the problem I'm facing, or are people finding testing against platonic states relatively unproblematic and it sounds more like I'm doing it wrong?
My POV on this topic comes from working on data storage systems.
Of your 3 problems, the 3rd is definitely the worst and I do not find tests of that nature to be useful. If the system changes and those states are no longer possible, what is the test telling you? These types of things are not durable to change, and detecting failures due to changes is the entire point of having these tests.
I look at a data storage system as a set of axiomatic actions, usually with some higher level actions built on top of those axioms. Since I can't just declare that
create
works because it is axiomatic, I typically will try to build high confidence in it to "axiomize" it, at which point I'm happy to then define the rest of the systme behaviour naturally in terms of those actions.Isolation is useful because it tells you what went wrong, but I see it more as a desired thing than a required thing. It's more important to catch bugs than to write tests that are well isolated but do not catch them.
Despite that, you can usually find some isolation in things like
create
by expanding your coverage of its side effects. What is observable about the null state? How much of that changes?Typically, I will want a
list
verb that is not defined in terms ofread
, which returns all of the state in a trivial way. Systems will also typically expose acount
verb that efficiently counts the number of records, either by tracking it as a sequence or just by not having to return all of the data. I also usually expose some internal counters for telemetry purposes, and these can be useful signals for establishing the correctness of core operations, so eg. a create from the null state might increase the number of pages allocated, or a count in the number of queries received.With those signals, now there's a lot more you can test about
create
and some of those should fail in isolation. Ifcreate
fails to increment the number of records, your problem is probably not inlist
orread
, eg. You can define your null state;count(null) == 0, list(null) == [], read(any) == null
, and then run acreate
and ensure all of those signals changed appropriately. The strongest signal is thecreate -> read
round trip, but you can uselist
as a backup check on data integrity. Iflist
worked butread
failed, the problem is probably inread
.Without
list
orcount
, it's potentially difficult to verify thatcreate
did not modify additional state in error, on top of the expected state modification, so these are actually really good for checking all of the bounds around your core operations.Now,
create
's behaviour from null state has a bunch of tests of various strength. Some may be weak signals, and some are worryingly coupled to the implementation, but the goal is to definecreate
so thoroughly that subsequent tests can take its correctness for granted. Later, if those details about the initial null state transition change, so long as the behaviour still works, the rest of your test suite will succeed and the problem should be pretty obvious.If
list
andcount
end up being too expensive for your system, or you don't want to expose that, then you don't have to, but you can still use them to verify the exported API.