Skip to content

Instantly share code, notes, and snippets.

@jameswiseman76
Last active November 10, 2020 09:24
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jameswiseman76/7400896ab2f0eb6ecf33c414173e995d to your computer and use it in GitHub Desktop.
Save jameswiseman76/7400896ab2f0eb6ecf33c414173e995d to your computer and use it in GitHub Desktop.
Facets and Challenges of Developing Mutation Testing System in .NET

Intro

I'd like the focus on the following:

  • Choosing Method of Mutation
  • Equivalent Mutants
  • Test Selection
  • Infinite Loop Detection
  • Unit Test Framework Support
  • Visual Studio Integration
  • .NET / .NET Core Support
  • Performance
  • Online Submissions

Choosing Method of Mutation

So there are two methods of mutation:

  • ByteCode / IL Mutation
  • Source Code / AST Mutation via 'Mutant Schemata'

ByteCode / IL Mutation

With IL mutation we take a copy of the raw assembly and seed one small mutation in that assembly. Each new mutation precipitates a new assembly. We then run the tests.

Only IL specific mutations. C# code with no direct IL mapping cannot be mutated. This can be deemed a disadvantage as it limits the number of different types of mutation that can be performed.

However, there is a standard suite of mutation operators which have been researched, understood and thoroughly used. The are good especially for generating a low yield of 'Equivalent Mutants' (see below).

Research has been undertaken into mutants that are specific to C# code. One example of which is removing 'this.' from the front of member variables. This specific mutant has a high probability of generating an equivalent mutant.

A mutation testing framework that uses IL mutation can be used by any .NET language. Whilst C# covers most of the ecosystem, there is still plenty of use for VB, although a recent statement by Mads Torgersen suggests that MD are starting to distance themselves from it.

https://www.theregister.co.uk/2017/02/02/our_strategy_for_visual_basic_has_shifted_microsoft_to_focus_on_core_scenarios/

That said, there's plenty of C++.NET out there, and F# seems to be gaining a lot more traction. It's in the F# space in particular that I think we're seeing a lot of innovation. It's heavy use in Fintech and actuarial modelling itself almost prescribes the need for robust unit tests.

Consider also, the philosophical point of what this tools is trying to promote: High quality code on the .NET platform. And, of course be prepared for the endless requests from other language users for this feature.

Henry commented on the potential for Junk mutants. I get a sense that might be an even bigger problem with .NET, though I'm not suitable placed in my knowledge of Java to provide an authoritative synopsis of this. Two areas that spring to mind in this aspect are:

  • Generics - Java uses 'Type Erasure' to realise this. C# uses 'Reification'. Might this be a source of more junk mutants?
  • 'dynamic' - Under the hood, dynamic types get converted into IL that uses reflection. Having looked into the IL for dynamically types in comparison with static ones, I've observed at times 30 extra lines of IL to achieve this same result.
  • async/await - I believe under the hood this is realised using a 'continuous passing style' pattern implementation. This must generate a lot of extra IL.
  • 'yield' - Under the hood, I think the precipirates an iterator-style GOF pattern.

Source Code / AST Mutation via 'Mutant Schemata'

With this method we mutate the abstract syntax tree. It could just form an alternative approach to the IL mutation, but this opens the door to the use of a mutant schemata to realise the solution.

As has already been mooted, Roslyn would appear to be tailor made for this.

Contrasting

In contrasting the two different ones, I came up with the following. I'm happy to be contradicted or to have different contrasts highlighted.

IL AST
Language support/Reach All - Language Agnostic Limited - Targeted Language(s) only
Rel Performance Slower (maybe) Faster (maybe)
Different Skillsets CIL, Reflection Emit/Mono Cecil Roslyn & AST.
Specific challenges Grokking IL and libraries (reflection emit/mono cecil) Seeding of conditionals in mutant schemata
Data Space Larger (maybe) Smaller (maybe)
Junk mutants Maybe No

Common skillsets are:

  • C# .NET
  • Unit test frameworks - at a minimum NUnit, XUnit, MSTest
  • Assembly manipulation libraries (reflection emit, mono cecil)
  • Visual studio plugin libraries

WHICH ONE?

From the discussions, folk have seemed to settle upon the AST mutation method without (in my opinion) giving enough consideration to the IL mutation method.

If due diligence determines it's the best method, then, fine, but I'd like it not to be the preferred method for any of the following considerations:

  • Fear of IL
  • Shiny new technology in Roslyn.
  • Blinkered commitment to C#

Equivalent Mutants

These are mutants which have identical functionality to the original core code, and therefore can never generate a failing unit test.

if ( i >= 1 ) {
return "foo";
}

//...
int i = 2;
if ( i > 1 ) {
return "foo";
}

There is no reliable way of detecting equivalent mutants.

Infinite Loop Detection.

As Henry has already noted this is also needed. I have undertaken no research in this area, however would be surprised if this was not already a 'solved problem'.

Test selection

As Henry also noted, test selection is critical, and also noted by him was the use of code coverage to achieve this.

Modern versions of visual studio have intellisense to determine which functions map to which test. I wonder if this is has an API that can be called.

However, my gut feel is that this is something that is going to have to be hand-rolled as well, as it dovetails with the code mutation a little. We already know the point at which we are mutating. So a good approach would be to identify all the mutation points and then seed code coverage recording at those points.

Note, that this implies that the mutations we have to perform are potentially threefold:

  • Function mutation for mutation test.
  • Insertion of code coverage recording
  • Insertion of schemata mutation switch (if AST/schemata method is adopted)

Unit Test Framework Support

When I first looked at doing this, it became apparent that there would be a need to directly call unit tests via a publicly exposed API. There are many frameworks out there, but the following I think encompass the a good share of what's used:

  • NUnit
  • XUnit
  • MSTest

Bear in mind that in developing a mutation testing system we have to explicitly develop to support a unit test framework.

Say what you like about MSTest, but it's still well used, and I think a necessity to support. And there lies the problem - last time I checked it didn't have a public API. No public API, no way to access test results easily programmatically.

Why's this important? Well, when we run a cycle of mutation tests we want to do the following programmatically:

  • Select and run the tests that cover the mutant
  • Read the results to check we have a failing test

It's not an insurmountable problem, but it will complicate things. Two years I blogged about this for this very reason:

www.jameswiseman.com/blog/2015/10/13/microsoft-pleeease-expose-your-mstest-api/

It might have changed, of course, I haven't checked.

Visual Studio Integration

Whilst early implementations need not consider it, we should be mindful of the ability of integrating into visual studio. I'll probably have to be exposed a bespoke 'TestRunner', and should have a public API that can be utilised seamlessly for anyone wanting to develop a plugin.

.NET / .NET Core Support

This is somewhat a moot point for the AST method, and, as it turns out for IL mutation also. .NET core has the same IL as .NET not-core.

https://stackoverflow.com/questions/34906969/does-net-core-generate-the-same-il-as-standard-net

Performance

Mutation testing is by nature an expensive business from a resource perspective, so we'll have to use every trick in the book to mitigate this when writing the code that implements the mutation system. This includes things like:

  • Minimal use of reflection. This means that code architecture approaches like DI should probably be avoided, and any use of 'dynamic' types limited/omitted.
  • Minimal use of dynamic - this is converted to much reflection under the hood.
  • Minimal use of unmanaged resources.
  • Careful use of 'syntactic sugar' features. E.g. auto-properties, async/await, generics, etc.

Online Submissions

This is something down the line, but it might be nice to have. Consider a cloud-based paid service where people can submit their code and have it analysed for free. For those with large code bases, leveraging the scalability of the cloud might be quite nice.

@sebrose
Copy link

sebrose commented Jun 14, 2017

Contrasting IL and AST:

  • I'd have assumed that IL mutations would be faster than AST (because no compile stage).
  • benefit of AST is definitive link back to source code statement mutated, and hence understandability for user.

I imagined:

  • minimal source code + test(s)
  • loop 10,000 times around:
    • apply single (hard coded) mutation
    • execution of test(s) against resulting mutant
      Just to get a feel for overheads.

@sebrose
Copy link

sebrose commented Jun 14, 2017

Performance - DI

  • DI doesn't need to use reflection, so maybe we don't need to throw the baby out

@jameswiseman76
Copy link
Author

Thanks, Seb. The devil is in the detail.

With the IL mutation, I'm not sure it's possible to do it in-memory so you may have to create a physical copy in disk for every mutation. That's how others have achieved this to date, and had a resulting vast amount of space taken up on disk. Then also have to consider the extra costs of pointing the unit tests at each assembly consecutively. An approach I have seen is to generate dynamically a test assembly along with each mutant. If there's not a better way to do this, then this again is expensive.

From what I take from Henry's comments, PIT does a lot of this in-memory with some cunning JVM trickery.

With the AST/Schemata method, you have one single assembly with all your mutants within it. Yes, you have the overhead of the compile, but that's a one-off at the start of the process. One-off also is the cost of pointing your unit tests at that assembly (as long as you are able to selective chose the tests to run).

If this is adopted it may well require some experimentation on the limits of assembly size.

It looks like I'm starting to argue the AST method without realising!

Yes, and as for DI, absolutely. I think I was wary of pitching in with a heavyweight DI framework (Castle, I'm looking at you).

@liam-m
Copy link

liam-m commented Jun 19, 2017

In terms of method of mutation, why is the mutant schemata being bundled with AST-based mutation? Couldn't it be used with IL mutation?

IL-based does sound very appealing due to compatibility with all .NET languages, but the difficulty of translating this back into sensible source code sounds like it could be a deal breaker (and this seems to be @AlexDenisov's experience) or add a lot more complexity than writing an AST mutator for each language. @hcoles says that byte code translates quite nicely back into Java but produces lots of junk mutations for Scale. I think it's pretty likely we'd see something similar for .NET languages.

I agree more diligence is due, we shouldn't just write it off as we don't know how different this could be for .NET. Does anyone have experience with decompiling .NET IL? Is there any prior art? Would anyone be willing to come up with a test project to give it a go?

@oscarlvp
Copy link

The core differences between mutation detection methods as been nicely pointed out, but, after the mutant is created lots of things remain the same disregarding the way the mutation point was detected and the mutant conceived. Test selection, ordering and execution could be the same in both scenarios: IL mutation or AST mutation. So, keeping the two main stages: mutant creation and mutant execution agnostics from each other, can lead to a more pluggable and modular system. The user can decide which mutation method will use.
Summarising, the tool can have both or at least one and leave the door open to the other. The rest could be the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment