when people say "integration testing", the feeling I get is that the definition that most people are using is "unit tests that happen to perform i/o". is this the definition that most people are using?
there's another definition, which is the definition I learned when I first learned about unit testing, which I have never seen anyone actually use: a unit test is an individual unit of testing, and "integration testing" is when you sequence the unit tests to create an integrated suite of tests. that is ... integration testing is when you integrate your unit tests, not when you test how your system integrates with another system. Those are distinct concepts! My suggestion here is not that the latter concept isn't valuable, it is valuable, it's just distinct, and rarely do I see the first concept being executed well.
For example, let's say you were testing some CRUD API and you wanted to test two things: the create and the update. The strategy that I most commonly witness is as follows:
- create a unit test for your
create
action. Start with a fixed, known state (let's call itc0
), then run thecreate
. The system is in some new statec1
. Check the response to thecreate
routine, as well as check thatc1
is the value of the state that you expect. - independently, create a unit test for your
update
action. Start with some known-good state (let's call itu0
), a state of an existing object in a database. Creating this state is itself work: it's new work to create this platonic starting state. Run yourupdate
against this platonic state (u0
), producing some new state (u1
). Check the response to yourupdate
action and check thatu1
is the new state value that you expect.
That's all well and good, but you've now created a handful of new problems:
- how do you define the success criteria of the
create
action (that is, the verificationp
such thatp(c1)
indicates that the test for create passes) that is not in terms of theread
action, in order to guarantee isolation of the things under test? What value is provided by testing thecreate
action alone? Does this not create a new hazard where the verification logic of thecreate
test can diverge from the actual logic of theread
action? - how do you define the initial state for the
update
test? (in this example,u0
.) Is that not simply the result of thecreate
action? Theupdate
action is now being tested off of a platonic starting state. How do you know that this platonic starting state is reachable by your system? Is it not the case thatc1
, the output of thecreate
test andu0
, the input of the update test, should always be equal or your tests are invalid? If that state is reachable now, how do you ensure that it continues to be reachable as your system changes? - if your
update
is tested off of a platonic starting state that is not the exact output of the create action, you now have two problems: yourupdate
is not testing the state reached by thecreate
routine, and you've created a new, false requirement that theupdate
action be usable against a state that is not reachable by your system. You had to go through all of the trouble of creating this state, which is new work, when thecreate
action ... literally does that work. The value provided by the isolation has to be significantly greater than the cost of having created that state, otherwise you're just creating busywork.
anyway, this comes up a lot for me since my primary project is a stateful multiplayer server whose only job is to contain and communicate the state of a game. integration testing this thing is ... hard. curious what people do for integration testing from a conceptual level, not from like a tools/language/library level. do other people also face the problem I'm facing, or are people finding testing against platonic states relatively unproblematic and it sounds more like I'm doing it wrong?
😄 👍
I think it's totally reasonable to write out a test with all of your bullet points happening in sequence.
These kind of remind of doing transaction isolation tests and rollback tests, where some state transformations get rejected and they shouldn't be seen by other users. For these, I write tests more like stories or scenarios; the bullet points you listed would probably be comments in the test, and it would be long, but I think it's useful to test these behaviours as a sequence of user actions rather than as a collection of isolated behaviours under different state conditions. These tests are very easy to read, but they only test a very specific permutation of actions, so they miss a lot of bugs.
For this reason I split tests into ones that cover the "intent" of a feature and ones that cover its "implications" in the wider system. Intention tests tell the story of what a feature is supposed to do, they're mostly intended to get the feature from zero to working, and for future developers to understand the intentions of the new feature. Coverage tests are there to test how that feature interacts with the wider system. Randomized testing, especially property testing (though I usually do this without a framework) is really good at the 2nd, but it makes for bad reading, and they are harder to write.
Cool! So you basically write tests as an n-ary tree, and each path to a leaf gets executed separately, but you've only had to define each node once for the tree instead of once per path. That could be really good esp. if you have big N's and a lot of depth. Unpacking the tree structure from the lexical code structure lets you support much larger N and more depth. I also love that failure at a node can short circuit the rest of the path so I don't see an enormous amount of failure output when I break something fundamental.
I can also see such a tree getting unwieldy to manage and hard to follow. It reads like "small functions" code because each step is broken up spatially and you have to follow its descendents much less naturally.
I've never been a cucumber/convey fan but one benefit to its approach is that the paths it builds through the call tree read a lot like the linear stories I like to tell in my tests through comments.
It looks good for enumerating a lot of permutations at each level, but while this is good for coverage, I've it is kinda mixed in terms of ROI for "catching bugs". This might be the problem you're running up against... having loads of paths gives you great coverage but the coverage of the average test is very low.
For a randomized test for transaction isolation, to continue my example.. if you have a huge class of writes going through the same latch internally, then you can settle on the write that is the easiest to verify for your property test, and then verify that the other writes hit the latch to ensure their "coverage" doesn't get invalidated. This saves a lot of time and a lot of code and while it is technically a sacrifice in coverage, the quality of the test suite is still really high, and the class of bugs that could remain are really low.
A lot of stateful systems have things like this in them, because their verbs are built on top of each other, and I'd say that the single property test + latch check gives me a higher degree of confidence in the system than a hand-built tree of all possible writes through whatever sequence of events that I imagined when writing the test, because the property test will run sequences that I didn't anticipate.
Maybe there is one out there, but I don't know about it. I've not seen many testing libraries or testing frameworks that are about providing you with a tailored approach to testing a specific kind of system. Even things like Quickcheck and Hypothesis, which are probably the closest I know of, don't really tell you how to approach this. They give you a generalized approach towards producing inputs and checking outputs and are therefore still kind of "pure".
There's also this consistent push from people from various angles (infra, FP advocates, etc) for everything to be "stateless" which.. yeah, fine stateless and immutable things are easier to reason about and preferable where possible, but also the world is stateful and everything in computing built on top of state, so we need to be honest and admit that we're offloading those problems and they're still important to solve instead of just avoid.
There's probably a whole lot of cognitive bias that is going on.
And finally: