when people say "integration testing", the feeling I get is that the definition that most people are using is "unit tests that happen to perform i/o". is this the definition that most people are using?
there's another definition, which is the definition I learned when I first learned about unit testing, which I have never seen anyone actually use: a unit test is an individual unit of testing, and "integration testing" is when you sequence the unit tests to create an integrated suite of tests. that is ... integration testing is when you integrate your unit tests, not when you test how your system integrates with another system. Those are distinct concepts! My suggestion here is not that the latter concept isn't valuable, it is valuable, it's just distinct, and rarely do I see the first concept being executed well.
For example, let's say you were testing some CRUD API and you wanted to test two things: the create and the update. The strategy that I most commonly witness is as follows:
- create a unit test for your
create
action. Start with a fixed, known state (let's call itc0
), then run thecreate
. The system is in some new statec1
. Check the response to thecreate
routine, as well as check thatc1
is the value of the state that you expect. - independently, create a unit test for your
update
action. Start with some known-good state (let's call itu0
), a state of an existing object in a database. Creating this state is itself work: it's new work to create this platonic starting state. Run yourupdate
against this platonic state (u0
), producing some new state (u1
). Check the response to yourupdate
action and check thatu1
is the new state value that you expect.
That's all well and good, but you've now created a handful of new problems:
- how do you define the success criteria of the
create
action (that is, the verificationp
such thatp(c1)
indicates that the test for create passes) that is not in terms of theread
action, in order to guarantee isolation of the things under test? What value is provided by testing thecreate
action alone? Does this not create a new hazard where the verification logic of thecreate
test can diverge from the actual logic of theread
action? - how do you define the initial state for the
update
test? (in this example,u0
.) Is that not simply the result of thecreate
action? Theupdate
action is now being tested off of a platonic starting state. How do you know that this platonic starting state is reachable by your system? Is it not the case thatc1
, the output of thecreate
test andu0
, the input of the update test, should always be equal or your tests are invalid? If that state is reachable now, how do you ensure that it continues to be reachable as your system changes? - if your
update
is tested off of a platonic starting state that is not the exact output of the create action, you now have two problems: yourupdate
is not testing the state reached by thecreate
routine, and you've created a new, false requirement that theupdate
action be usable against a state that is not reachable by your system. You had to go through all of the trouble of creating this state, which is new work, when thecreate
action ... literally does that work. The value provided by the isolation has to be significantly greater than the cost of having created that state, otherwise you're just creating busywork.
anyway, this comes up a lot for me since my primary project is a stateful multiplayer server whose only job is to contain and communicate the state of a game. integration testing this thing is ... hard. curious what people do for integration testing from a conceptual level, not from like a tools/language/library level. do other people also face the problem I'm facing, or are people finding testing against platonic states relatively unproblematic and it sounds more like I'm doing it wrong?
yes, exactly. It's called "tea" because you're "reading the tea leaves" to tell your fortune.
yeah one of the shortcomings of using Go's sub-tests natively is that if your test fails and exits early, the sub-tests are never created. So a pass might be like "300 tests passed", and a failure might be like "150 tests passed, and 1 test failed", when the reality is what happened was "150 tests passed, 1 test failed, and 149 tests were skipped". Although the Go sub-test allows you to mark tests as skipped explicitly, the ergonomics of doing so means that it's very easy to mess that up. tea handles that for you automatically.
this is a massive problem we have now with tea. Writing tests is super easy, looking back at the tree and adding tests to a large tree that already exists is nightmarishly confusing. I have to do some work to improve the ergonomics of larger test graphs.
yeah so one of the problems that convey has is that because a test accesses the side-effects of its ancestors via closures, and stack frames always have a single parent, a given test can only appear along a single path. Since tea uses structs and struct fields to persist the runtime environment from test to test instead of stack frames and closures, tea has no such constraint. For example:
You can do that now and it works. The entire tree is a data structure that can be manipulated arbitrarily, so you can adopt tea easily into projects that use table-driven tests.
Also any two equal test values are equivalent. I think this makes them "referentially transparent" but I've never done FP so I dunno. This:
Is exactly the same thing as this:
It doesn't matter that they're pointers to the same struct because tea doesn't actually use that struct: that value is only treated as a template, it's copied before it's ever used. That value is never actually mutated; we create a new value to mutate to ensure isolation. This works presently and I rely on it.
I've been working on making it so that you can combine nodes in the plan to treat it as a DAG instead of a tree. So long as there are no cycles you can always break the DAG apart into its component paths. E.g., that prior example would hypothetically turn into this:
(that's not implemented yet though.)
oh cool these weren't on my radar, thanks for mentioning them, they'll be good prior art to look at.
yeahhhh I encounter this -a lot-. My project serves only a single purpose: to handle the state management so that other systems don't have to. So much conventional wisdom is poorly suited to projects of this nature because usually people are just using a database, but what I'm making has many similarities to databases.
anyway thanks for the feedback, it sounds like making a library like this isn't raising all sorts of red flags to you.