kulicuu/outlinenotessoftware-testing.tex

## outline__notes__software-testing.tex
\documentclass[9pt, a4paper]{extarticle}
\usepackage{fontspec}

\usepackage{extsizes}
\usepackage{hyperref}
\usepackage{marginnote}
\reversemarginpar
\usepackage{csquotes}
\usepackage{geometry}
\geometry{a4paper, left=3.8cm, top=1cm, right=1.8cm, bottom=1cm, footskip=.5cm, marginparwidth=70pt}

\date{\today}

\title{Engineering Critical Network Applications}
\author{ J Wylie Cullick}

\begin{document}

\maketitle


Abstract: \emph{While there is no intrinsic demarcation between the casual web of social-media and the essential web of critical-systems (life-critical, mission-critical events in corporate, medical, logistical, and military operations), as both consist of essentially the same components, the variance is well-defined as the {\bf cost-of-failure}. Reflecting on experience witnessing real-world failures of critical software systems, we assess the conception and overall structure of the entire design, architecting, project-management, and code implementation. In particular, we prescribe a new approach and operating concept to guide software project-development on critical systems; this consisting of a single comprehensive test implemented as a suite of network-simulations running botnet-driven facsimiles of real-world operating conditions on the target system.  ...scaffolding structure built around a comprehensive test, this constituted of a suite of bot-net-based network-simulated event-timeline assessments of the target build system.}


\tableofcontents


\section{Assessing Current 'Best-Practices'}

\marginnote{
{
\footnotesize
\emph{
Naive Development:  1. Specification (non-specific)  2. `Common-sense' architecture 3. Decomposition into structures of more primitive pieces, by level, generating a population of atomic development \textquote{units}. 4. `Test-Driven-Development' at the unit-level.
}
}
}[2.5cm]

{\bf Naive Development Process Architecture Lifecycle Timeline}:  1. 'Spec'. 2. Naive 'common-sense' high-level schema/architecture sketched out, delegated, in which 3. the complex responsibilities are made tractable by decomposition, level by level, until atomic \textquote{units} of development are defined. 4. Canonical \textquote{TestDrivenDevelopment} (\textquote{TDD}) process begins here: Unit-test suites and unit implementations.

1. The way practiced in the industry today typically begins with a poorly formulated specification. Historically this has not been allowed to be a hindrance, because formal specifications are too difficult for most non-technical business-oriented people, and often too time consuming anyway in a fast-paced business imperative driven world.  So 'Agile' calls for the iterative approach with development evaluation cycles, at a purely functional level, but also at the personal evaluation level with the final customer.  Features may be added, rolled back, modified, according to testing and user-feedback on any of a variety of factors, often subjective.  No real problems thus far.
2. The next phase in development is the high-level design, 'architecture' of the system.  Typically this will be discussed, everyone will evaluate it by running it logically through their mind, and if it seems to be consistent, it will pass.  Typically though, in the full implementation, high-level designs have inherent edge points at which particular sequences of events can cause data locks, races, and these crash the system.  A high-level design may be mostly correct and yet still suffer some edge-case flaws that will bring it down in production provided sufficient usage traffic and time elapsed, a probabilistic event.
3. Then basically this high-level design is componentized into 'micro-services' through some number of abstraction levels until the granularity of the 'units' is arrived at, an abritrary decision of the designers where exactly that is and exactly how they want to partition their monolith into components.  Once the units are identified, the granular atoms at the end of the chain of abstractions that spans the system, firms will typically show their fealty to the TDD slogan by maybe writing their 'failing' unit tests, and then the unit implementations.
4.  When enough of these units are completed, there will be some process of integrating various components into larger testable complexes, over which may be run various tests over 'happy path' procedures, as well as some canned edge-cases.  While firms typically repeat the slogans about CICD, fewer progress to making their own custom CICD tooling based around containerization tactics with Docker and Kubernetes.  The latter will enjoy much improved speed of integrations of complexes of testable software, and the most adept at this will continuously be able to assemble from recent code additions a new production system, which can be tested in isolation.  Some firms do this well, some do not.  Some firms cannot even put a production system (fully identical to their real-world production system in every way except that it's not having real client traffic directed to it) into a testing environment  their staging system, that they use for their integration testing is an approximation of their production system, but not identical.  So anyway, using scripts, quality assurance engineers will assemble various suites of end-to-end tests, so there might be some users signed-up, logged-in, run through various transactions, and so on.  These are considered 'happy paths', and there will be integration tests over all of those that constitute the logical specification of the system.  There also will be some 'sad path' edge case tests formulated, but only for those pitfalls which have been forseen, and most pitfalls are not forseen, it's the real-world that triggers the edge case data races, locks, and hangups.
5.  So the system passes the integration test suite and is put into production.  Typically at this point, the system encounters a qualitatively different and substantially richer block of inputs, which elicit a substantially richer set of behavior and dynamics on the part of the system itself.  These will typically include bugs, crashes, unforseen things make themselves manifest and the systme fails partly or fully.  This is the norm, at all levels of the industry.
What went wrong is simple to see, best expressed in the fashionable complexity/chaos theory maxim that the behavior of complex systems is greater than the sum behavior of its parts.  People know this intuitively, if I say I tested every component for an automobile, and they all passed, but that the design contains inherent dangers, the car will fail.  People know this immediately, and yet in the software industry, it seems to escape people, this notion.  Otherwise, why the reliance on huge suites of unit tests, and why bafflement when a few isolated integration tests fail to uncover critical flaws in a system before production?

\section{Diagnosis and Indications}


Conceptual mistake:  To imagine that unit-tests have any validation value against the integrated system.
Reality:  As expressed formally in complexity/chaos theory, the behavior/validity of a system is greater than the sum of the validities of the parts.  People can understand intuitively that


\section{Details on Process}


Network simulation is the only way to truly validate software. In the server/client paradigm, the client environment simulation can be implemented as a bot-net, each bot a client. A bot is a simple stateful program which opens socket connections and listens for orders from an orchestrator process.  Software systems may interact with data streams which are not network traffic; in such a case a network simulation may not be sufficient.  If testing avionics integration, we may need to simulate radar return data, other kinds of raw telemetry.  Ultimately, the most difficult challenge is physical simulation, but this will be out of the bounds of requirements for strictly software systems as I forsee.


\section{Test Driven Development Revisited :: Towards Future Best Practices}


So the proposal is, instead of the normal development process lifecycle, which looks like:
1. high-level design conceptualized and architected, then people look at it and assume it's good.  Often these assumptions are unwarranted, there may be unforseen complications in the real production manifestation of this design.  First mistakes are made here, and are epistemological errors.
2. Components (e.g. 'microservices') and interfaces are defined, in layers, down to the 'unit' level.  At every layer, once the basic logic is recognized as sound, people generally assume that the design is fundamentally sound, i.e. perfect.  Typically it is not.  There are two ways to figure out whether it really is a sound design: (1) Put it into production and wait through several lifetimes of use, and if you have multiple worlds to test it on, make use of that opportunity for the extra data diversity.  If it doesn't fail, that's a pretty good indication that it's probably sound design.  (2)  Take the candidate production system operational but not online, rather subject it to a bot-net 'attack' which simulates as closely as possible the real-world conditions -- precisely -- that will be encountered in real-world production.  In this case we can parallelize and run multiple virtual-world simulations simultaneously, to collect data on a variety of lifecycle scenarios.
3. Unit-Tests (aka 'the failing test') are defined to specification, then implementations are coded against those unit tests.
4.  Integration-Tests.  Once all the units are working, they'll try to integrate them with some custom scripts to approximate specified operating procedures, for example signing-up of a new user.  These are technically difficult tests to engineer and implement; thus typically there aren't that many of them before a system goes into true production.  Typically even the highest level integration tests will not be conducted on a true production system.  There are companies which lack utterly the ability to put a production system up in isolation, they can approximate it in staging, but their true production system is only seen in production.  This means that their real testing, the testing that matters, is conducted in production; if it's a social media fail then no big deal, but on life-critical systems there can be real problems.
5. Iterate...
That above is the standard industry practice; some people would consider it 'best practice.', and it's considered the standard way to utilize TDD in software development.

I propose we reverse the flow a bit as far as the testing is concerned.  I'll re-iterate, I believe TDD is conceptually the right approach, but the unit-test emphasis is "looking through the wrong end of telescope" so to speak.  I feel that I present the 'true interpretation' (quasi-religious terminology for humorous effect) of TDD, in which the primal 'failing test' is the test at the highest level, of the whole system, as a whole -- contrast with the opposite approach, which starts at the other end of the abstraction hierarchy, with the unit test.


So we start with the failing test of the whole software system.

1. First step is to specify in a somewhat formal language, exactly what is expected characteristics of the system.
2. From this the primal 'failing test' can be generated.  For the most part, software systems can be tested realistically with just a network simulation, as most applications only interact with the world via network traffic.  If we are testing a radar processing software, we might need to simulate realistic telemetric feeds, another level of hard.  Network simulations are as easy to implement as a bot-net, with diagnostic data hooks feeding message queues feeding compute nodes, which distill the massive data into human assimilable form.
3.  Simultaneously, we'll perform design candidate generation, componentization possibilities, broken into units, which can be implemented to start to satisfy some of the routines run through the network-simulation.  As TDD, the 'failing-test' provides the epistomological scaffolding against which we can develop, which provides us with immediate feedback on the efficacy of our code additions in near-real-time.
In our case the 'failing test' is a network-simulation capable of the broadest variety and sophistication of tests, rather than a unit test, though unit tests may be employed at developer discretion when they may be of benefit to the implementation process.  So for example I might implement a user sign-up procedure unit, I could immediately integrate it (with custom CICD containerization routines using e.g. Docker/Kubernetes) into a pseudo-production environment, and run it against the sim-net routines, giving me immediate fine-grained (as good as unit-tests at the low level, but obviously much better at the high-level/integrations levels).
4. Iterate...


\section{Some Considerations on Implications:}

If the cost of failure is less than the cost of proper validation pre-production, then there is no economic incentive for it.  Clearly the network-simulation infrastructure development carries overhead costs, whether or not they are justified for a given software program depends on the particulars of the use case.  There are some mitigating factors to consider.

 Mitigating Factor A: The Bots Share Code With the Clients.

As an illustrative example, let's say we have consider a social media app with virtual currency, chats, transactions of different kinds. It's gotta scale horizontally for a huge user-base.  Basically it's going to be a server cluster with distributed state over caches and databases, message queues etc, and the clients will connect through say React, React-Native, and true iOS, Android natives etc.  So when we go to generate a network simulation against this system we will have a bot-net swarm listening to an orchestrator, and sending data to message-queues, which feed compute-nodes, which feed data-visualization engines for human analysis.  Each of the bots is a simulation of a client app being operated possibly autonomously, more likely by a human.  The simulation of the client app might as well be the client app, minus the graphics render and other device effecs.  The bot as a whole will also contain logical elements simulating the human (or autonomous) controller, and these are additional elements of the bot which are not part of the client app, but actually the entirety of the client app can and should be identical in codebase with the simulation of the app in the bot.  This offers a tremendous savings in development time as compared to the situation where the network-sim implementation shares no commonality with the client app implementation.  I predict great efficiency improvements in the development process as a result of the closer coupling of the client and server development via a shared comprehensive testing infrastructure manifested in the network simulation.  By having a more streamlined testing infrastructure, we can reduce the time to evaluate code additions, which is the iterative cycle bottleneck in most development environments.  When this evaluation time is reduced by half, it enables double the speed of development, so it is not a challenge to predict great development enhancement from vastly improved diagnostic setup.

Therefore:
Mitigating Factor B:  The Network-Simulation Testing Infrastructure Provided Improvements to Diagnostic Feedback Time Cycles Improve Development Efficiency, Offsetting the Development Overhead of the Network-Sim Itself.

Also:
 Mitigating Factor C:  The Improvements in Knowledge About the Real-World Production Characteristics of the System Gained Will Save Time By Avoiding Production Failures, Hotfixes, etc.
This offsets the network-simulation development overhead.


\section{Fragments:}


1. The software industry suffers from an epistomological crisis with every project.  The end material result is a bunch of text files.  The grandest of the tech titans' operations are materially represented by no more than a mass of text files.  They can be put into production, and then there are transient informational effects, which are certainly tangible economically and otherwise, but nothing with the physical certainty of a skyscraper, a car, an aircraft-carrier, or a bridge.
It is true, that bridges need to prove their worthiness over a lifetime of decades, and in this way structural engineering too faces epistomological challenges: to wit, how do we know the bridge will survive over many years lifetime?  How do we know the bridge can survive seismic forces of given strength and wave distribution?  Hard science, and ultimately simulations, whether on paper or computerized, must be run against the most fidelious models that can be generated.
What a good testing infrastructure gives us is "provability", proof-of-work.  Not necessarily provability in the mathematical in the purely functional sense of truly rigorously provable software possible under purely functional conditions, but more of a soft provability that depends on our dedication to fidelity.  A legal proof is in a similar epistemological situation, where a hard scientific proof may be elusive, but the weight of evidence supports a given conclusion.  If I have a customer who requests a particular kind of social-media application, and I build it, how can I prove that I've done my work, short of actually putting into production and waiting some years for no bugs to appear.  Before that it's just a bunch of text files, no use.  The sophistication of our testing infrastructure not only gives us operational efficiency in the speed and accuracy of our development, it also gives us evidence and documentation as to the efficacy of our texxt files when executed under operating conditions.  This is not an original point on my part, this is rather canonical TDD rhetoric, and yet, TDD as practiced does not actually meet these lofty expectations, i.e. does not actually provide proof of soundness of a body of work and design, because it scarcely ever rises out of the unit-test layer, with weak integration testing and no true total simulation under operating conditions.  Half-jokingly, I'm proposing that the 'true interpretation' and 'pure' TDD doctrine should be that we need to start every project at the highest level with a comprehensive failing test, which must be a simulation of the world as our candidate system will encounter it.  This will bring many benefits, and I predict this approach to software-engineering will become 'best-practice' of the future.


2. With regards to the specification of a software system as the first step towards its design and implementation.  When the specification is complete, then the network-simulation 'failing-test' outline, high-level requirements can be generated, over which the network-sim bot-net behavior will be implemented.  This test, which is really even more than a suite of tests, is a system for generating an infinite number of tests of various permutations, which in this context would be called scenarios.  This test which we may call the network-sim, sim-net, bot-net, testing-infrastructure, serves several purposes.  From a business standpoint, it serves as proof-of-work, proof-of-operability mechanism.  Without this, we only have text files.  Putting a candidate system into production as a demonstration and test is not good practice, even if it is common practice.  A suite of integration tests running a collection of 'happy path' procedures against core logical functionality may prove basic operation, but it will not uncover flaws which may only emerge under a qualitatively richer set of network activity -- this is typical.  The network-simulation approach is conceived precisely to ameliorate this current flaw in the testing regime.  This is the second purpose of the test, strictly operational, it is what ensures healthy operations in production, free of hotfixes, various crises etc.  The third purpose of the test is to improve speed and accuracy of development.  If we can triple development speed by reducing iteration cycles by providing faster integration of components and faster feedback on system behavior changes due to code additions, we may drop a six-month development cycle two months.  The idea here is that higher quality and timelier diagnostics can significantly improve development quality and speed.  The fourth purpose of the test is also business rather than engineering related, as the implmentation of the test against the specification will help to solidify and formalize the latter.  A high-resolution and internally consistent specification helps the whole spectrum of the enterprise; engineer, market-analyst, and product-designer alike benefit from higher quality formal specifications, as formal ontological constructs generated are applicable to enterprise activities generally, across the board.  Nothing isn't connected to the whole, in this sense. Just remember it's the network-simulation implementation against the specification which is going to force the latter to consistency, clarity, etc.  This is typical in software projects, that the implementation will drive out any vagaries from the specification.  It is to our advantage to start this process at the beginning, rather than running into at deployment!


\section{Content ToDo}

This isn't simply a method of testing and validating software, critical software, this is a way to build the software too.  This whole method constitutes not simply a way of assessing quality of a finished product, this way is central to structuring the entire development process from pre-technical conception of the problem, architecture of development, and build process, in addition to forming the testing infrastructure.

Explore more the concept of scaffolding as metaphor.  This validation and testing infrastructure is like scaffolding that not only facilitates access but contributes to the structural --logical-- cohesion of the system in the process of being built.


\end{document}