jusleg/345.md

## 345.md

      
    Raw
  

              345.md
            
          
    Soen 345

Some definitions

Legacy code: code without tests. Without test, we don't know if the code is getting better of worse
Unit test: tests that run fast and help localize problems. They test a specific part of the code
Integration test: Test that spans across multiple modules. Not fast. Insure that a complete feature is working
Continuous integration: Automatically build, test and analyze software to every change to the source repo. New
commit = building, testing and analyzing again. (Ex: Travis CI)
Continuous delivery: Ensures software change can be delivered by testing production-like environments (Ex: testing with Ruby 2.4, Ruby 2.5,... )
Software is constrained by people who wrote the software before you. You want to code things as simple as possible and skip things that you are not going to need..
TDD (Test driven development)


Write a failing test case
Get it to compile
Make it pass
Removed duplication*
Repeat

*At step 3, you might copy old code to use in your new code. After the test pass, it is a good idea to refactor to remove this duplication. There isn't always duplication.
TDD cons


lot of discipline needed to write test before the code
lot of small useless tests
Need to maintain those tests

The solution: you compromise!


Every commit, you need to make sure that it has test covering the feature or code you added/modified
If you don't do that, you will never do it. (Just be like Shia Labeouf and do it)

TDD pros

It lets us focus on a specific part at a time: You are either writing new code (code/tests) or refactoring. Not both at the same time.
Characterization tests

WHY: To protect the existing behavior of legacy code against unintended changes.
WHAT: It characterizes the actual behavior of a piece of code
HOW:

Put the code in a test harness (automated testing)
Write an assertion that you know will fail
Let the failure tell you what the behavior is
Update the test so that it tests that behavior
Repeat

It's important to characterize important parts or code you think will change in a future update. This will provide you a heads up in the event that the behavior changed
Example: If you use a stack in java, you will notice that if you use an iterator, the results will be returned in the wrong order and the elements are not popped out of the stack. This is due to the fact that the stack uses the queue iterator. It is a great idea to add characterization tests to your stack so if the stack gets updated to have a proper iterator behavior, it will notify you of the change.
Breaking dependencies (the fun part)

Dependencies among classes make it difficult to get a cluster of objects under test. Sometimes you'll end up with the whole system in the test harness (you don't want that)
Two reasons why we might want to break dependencies:

for sensing: when we can't access the values our code computes
for separation: when we can't even get a piece of code in the testing harness. We do that so we can test the code.

Fake collaborator or fake object

Object that impersonates a collaborator of the class being tests
before

After

We made an interface Display. The sale object now has a display object instead of ArtR56Display. We can then instantiate it with the fakeDisplay to test the behavior of scan() and verify that showLines() would be called with the proper value.
Mock

A mock is a dummy implementation for an interface or a class which we can:

Define the output of certain method calls
Configure to perform a certain behavior
Validate the interaction with the system

Mockito: Library enabling mock creations, verifications and stubbing. To do so, it uses reflection and the proxy pattern.
Creating a mock
ObjectYouMAde mockObject = mock(ObjectYouMade.class);
the mock will remember all the interactions, but will not run the real methods when you call it.
Verifying calls to a mock
CustomObject obj = mock(CustomObject.class);
verify(obj).methodYouMade("Some param");


// other examples
obj.methodYouMade(3);
verify(obj).methodYouMade(3); //true
verify(obj).methodYouMade(4); //false

// N.B. verify does not verify by default the order of the method calls

Verify the order of calls to a mock
To do so, you need to initialize an InOrder.
InOrder inOrder = inOrder(mock1, mock2, ...);
inOrder.verify(mock1).methodYouMade(ParamUsed);
inOrder.verify(mock2).methodYouMade(ParamUsed);
// in order to pass, the method was called on mock1 before mock2

Stubbing
Return whatever value is passed in
when(mockedObject.get(3)).thenReturn("ok");
mockedObject.get(3); // ok
when(mockedObject.get(3).thenReturn("It's order dependent");
mockedObject.get(3); // It's order dependent
// you can also stub for any param value
when(mockedObject.get(any(int.class)).thenReturn("Cool");

Stubbing consecutive calls
when(mockedStack.pop()).thenReturn(3,2,1);
mockedStack.pop(); // 3
mockedStack.pop(); // 2
mockedStack.pop(); // 1

ArgumentCaptor
Captures the value that was passed in a method. You can then use that value to verify that other object or other methods were called with this value
ArgumentCaptor<String> argCaptor = ArgumentCaptor.forClass(String.class);

You can set the type you want to capture. Then  you can use the following 2 methods:

argCaptor.capture(); to save the arg value
argCaptor.getValue(); to return the arg value

N.B. You can capture multiple times
Here's a better example:
// Object.java
private void setXY(int xy) {
	setZ(XY);
	this.xy = xy;
}

private void setZ(int z) {...}

// Main.java
ArgumentCaptor<Integer> argCaptor = ArgumentCaptor.forClass(int.class);
verify(spyObject).setXY(argCaptor.capture());
verify(spyObject).setZ(argCaptor.getValue());

Using Spy
Spy allows you to use a real object and used verify + stubs on it if need be

Wrap the object in a spy

List list = new LinkedList();
List spy = spy(list);


enjoy!

If you call the methods, they will still work as intended. If you stub a method, the functionality will be overwritten by the stub.
Bisection

It is used to find when something broke and it wasn't tested.

write a test
bisection search to find the last good commit. The commit to the right will be the commit that broke it.

Flaky tests (oh he tweakin')

A flaky test is a non-deterministic test (can exhibit different output in the same run)
2 types of non-determinism:

Inherent non-determinism: noisy or complex tests. Race condition
Accidental non-determinism: Old / out of date test introduces flakiness

How to handle flaky tests
Google's way: Only report failure if it fails 3 times in a row.
Microsoft: test 1000 times and if below flaky ratio, quarantine the test
Ericsson: binomial fail ratio: Worst case, you run 384 times a test. It gives you a number of time to run based on the flaky ratio. If the actual ratio is above the baseline, it is considered a failure.
Testing smells

1. Hidden dependencies (dependencies inside constructor)

Solution: Parameterize the constructor. Make a constructor that accept the dependency as parameter and rewrite the original constructor to call the new constructor.
Before
public Sale(Display ..., Storage ...){
	this.display = ...
	this.storage = ...
	this.interac = new Interac(42); // we can't mock that right now
}

After
public Sale(Display ..., Storage ..., Interac ...){ // new parameterized constructor
	this.display = display;
	this.storage = storage;
	this.interac = interac;
}

public Sale(Display ..., Storage ...){
	this(display, storage, new Interac(42));
}

We parameterized the constructor so we can insert a mock for testing purposes.
2. Blob (object with so much shit inside)

If you have an object with a large number of instance variables, parameterizing the constructor might now be the best way. A quick fix it to supersede the instance variable of interest (create setter for the instance variable so that we can dynamically insert mocks).
N.B. These methods should only be used for testing
3. Globals (singleton pattern)

Global and singletons are evil. Since the test depends on something that is globally modifiable, it can cause a lot of problems.
First, a good way to implement a global object is to use the singleton pattern.
Example
public class SingletonDemo {
	private static  SingletonDemo singleton;
	private int x;
	private int y;

	private SingletonDemo() {
		x = 8;
		y = 10;
	}

	public synchronized static SingletonDemo getInstance() {
		if (singleton == null) singleton = new SingletonDemo();
		return singleton;
	}

	public int getX() {
		return x;
	}

	public int getY() {
		return y;
	}

	public void setX(int x) {
		this.x = x;
	}

	public void setY(int y) {
		this.y = y;
	}
}

With a singleton/global, the tests become order dependent as they are coupled to the value of the global. A solution is to reset the singleton/global to its default value after every test. Doing so fixed the order problem but still doesn't allow to run the tests in parallel.
Dependency injection

Let's say you have a Store object:
public class Store {
	private Interac terminal;
	private Register register;

	public Store(){
		terminal = new Interac(12);
		register = new CashRegister();
	}
}

there is currently no way to mock the hard dependencies of interac and register. With dependency injection, you won't need to explicitly set the dependencies with new in the constructor. Those will be set from the outside and will be modifiable. A small refactoring will be necessary.
1. Parameterize the constructor to include every parameter

public Store(Interac terminal, Register register) { ... }
done.
2. Add the @Inject annotation above every instance variable and constructor of the class you want to inject stuff in

@Inject
private Interac terminal;

@Inject
privat Register register;

@Inject
public Store(Interac terminal, Register register) { ... }

done.
3. Create new module extending AbstractModule to declare configurations

// StoreModule.java
public class StoreModule extends AbstractModule
	@Override
	protected void configure(){
		bind(Interac.class).toInstance(new Interac(12));
		bind(Register.class).to(CashRegister.class);
	}
}

This will set the same configurations as the original code.
.toInstance(...) is used when you want to specify an instance that uses a non default constructor
.to(Something.class) is used when the LHS is different than the RHS and you use a default constructor
N.B. if the LHS and RHS are the same and are using the default constructor, you do not need to bind it. Magic will happen automatically.
4. Add the injector in the class that creates the object

Injector injector = Guice.createInjector(new StoreModule());
Sale sale = injector.getInstance(Store.class);

Note: Dependency injection uses inversion of control to allow framework to specify the dependencies
Testing with dependency injection

1. change the runner

@RunWith(MockitoJUnitRunner.class)
2. Set the instance variables in you test class with the @Mock annotation

@Mock
Interac terminal;

@Mock
Register register

Store store; // we'll need it for step 3

3. Add a before step using the @Before annotation

@Before
public void anyMethodName() {
	store = new Store(terminal, register);
}

this will be run before every test. It would be a great idea to reset singletons in a before action. After step 3, you are done. You can test like you used to do it.
Dependency inversion vs dependency injection

Dependency inversion principle: The code should depend on abstract classes and interfaces only, not concrete implementations
Bad example
class SomeClass{
	private CandyStore store; // CandyStore is a class
}

Good example
class SomeClass {
	private Store store; // Store is an abstract class
}

Dependency injection is a dependency inversion enabler (it helps us acheive dependency inversion)
Consistency checking


There are things that you cannot test (or it would be way too expensive to test).
Ex: race condition, hardware problem
Never assume your db has the right data

You make a consistency checker to verify that the data is still valid (not outdated or corrupted). You can quickly verify the integrity of a file by comparing the expected checksum with the actual checksum.
How to check you whole data?


Run a script on every partition
Every partition has a set of units (more on that later)
Run sanity checks on every units (same hash, same size, ...)

What is a good unit

unit size varies. You can have really small one and not so small ones

Small: value in a db
Single  larger unit: file (use hash)
Small composite unit: size of a file, total number of records
large composite units: directory structure (use a hash)

Schedule the checker


Assign a partition per check runner
Set a cursor and record its position

if a violation if found, store it in a db (we'll use it later)
Fix an inconsistency

when an inconsistency was found and store in the db, a script will run over all the variations and fix them. Run the checker again afterward
Recap


Start a checking job
Compare the actual value against the recorded values
Check for violations. If one is found, report it and persist it.
if valid changes are made to the data, you need to update the checker.

Data migration

1. Start with a forklift

Take a snapshot of the data and start feeding it in the new datastore. Do this in offpeak hours
2. Incremental Replication

Since we don't want to run the forklit over and over again, use incremental replication instead to move new/updated data to the new datastore. Store a flag of modified row. This is run continuously so that the new datastore mirrors the old one.
3. Consistency checker

Since the incremental replication might now always work, use a consistency checker to find violations between the two datastores. Keep track of any violations and update the outdated records in the new datastore.
4. Shadow writes

We shadow writes we still read the old datastore as source of the truth, but we write to both datastores. When a write happens on the old datastore, an asynchronous write is also done on the new one. We need to track the status of those writes. Any failed writes will eventually be fixed by the consistency checker. You can also evaluate write performance on the new datastore.
5. Shadow reads

With shadow reads, you will read from both datastores. When a request is made, both datastores are used. The data from the old datastore will be served to the user. The data from the new datastore is used to verify the consistency. Keep track of the result over a period of time. Gradually roll it out to evaluate the performance.
6. Perform full migration

When shadow read shows that there is minimal data mismatch, the datastores are ready to be switched. Flip a flag and now the new datastore is the source of truth and the old one is not used anymore.
Feature toggles

If you work with release branches or feature branches and you ship broken code that break functionalities, you'll have to do a revert or and emergency patch. This will take time and will require recompile + redeploy of the application globally.
Solution: Feature toggles
Instead of shipping code that is instantly used in master, hide the feature behind a feature toggle. A feature toggle is just a fancy way of saying a flag. You ship new code hidden behind the flag.
Example of a toggle
public class StoreToggles {
	public static boolean newSalesModule = true;
}

Example of the toggle being used
...
if (Storetoggles.newSalesModule) {
	// run the super cool new code
else {
	// run the old boring code
}

If something does not behave properly with the new code, simply turn the toggle off. After the feature has been running for a while without any issue, you can delete the toggle and the old code.
Advantages of feature toggles


Support for A/B testing (turning the flag on for a percent of user)
Canary release (gradual rollout of a feature)
Feature management (quickly turn on/off a feature)
no need to redeploy

Disadvantages of feature toggles


toggle debt (at some point you have so many toggles that you don't know what they do.
Combination hell (some feature might only with a specific combination of toggles. Removing one toggle might break everything)
Dormant code (You are keeping code that could potentially still be called. You need to be careful)

Loggers

Intro

// get root logger
Logger logger = LogManager.getLogger();

// get any other logger
Logger analytics = LogManager.getLogger("analytics");

// Example of logger usage
logger.debug(“Debug log message”);
logger.info(“Info log message”);
logger.error(“Error log message”);

Log level


All
Fatal
Error
Warn
Info
Debug
Trace
OFF

N.B. All and fatal will show everywhere but off. Error will be visible to every log level below itself. This is the same for every log level. For instance, info will be visible at debug and trace
Log hierarchy

The hierarchy of loggers uses a dot structure
root
root.parent // parent is root
root.parent.child // parent is "parent"

Small examples with log levels

In this example only root logger is defined


Logger Name
Assigned LoggerConfig
LoggerConfigLevel


root
root
DEBUG


x
root
DEBUG


x.y
root
DEBUG


In this example they all end up the same as root as they are not defined; therefore, they inherit the parent config. Since x is not defined it is not the parent of x.y ; therefore, they both inherit from root.
Another example with X and X.Y.Z defined


Logger Name
Assigned LoggerConfig
LoggerConfigLevel


root
root
DEBUG


x
x
ERROR


x.y
x
ERROR


x.y.z
x.y.z
WARN


x.y has the same values as x as it is not defined and x is its parent. It is not the same for x.y.z as it was explicitly defined
Another example with X and X.Y defined


Logger Name
Assigned LoggerConfig
LoggerConfigLevel


root
root
DEBUG


x
x
ERROR


x.y
x.y
INFO


x.yz
x
ERROR


x.yz is not a children of x.y as it is missing a period. Since it is not defined, it takes its parent value x.
Appenders and additivity

Appenders allow to print to multiple destinations (sysout, file, db,...). They are inherited from the hierarchy. By default a logger will get the root appenders and its child will get his its appenders + root appenders and so on. To stop the forwarding of appenders, simply use additivity="false" this will not get the appenders from the parents.
N.B. Log request are forwarded down the hierarchy


Logger Name
Specific Appenders
Additivity flag
Active Appenders


root
A1
n/a
A1


x
A-x1, A-x2
true
A1, A-x1, A-x2


x.y
none
true
A1, A-x1, A-x2


x.yz
A-xyz1
true
A1, A-x1, A-x2, A-xyz1


security
S1
false
S1


security.access
S1-access
true
S1, S1-access


We can see the additivity with x, x.y and x.y.z. Also we can see that turning off additivity does not inherit A1 from root. Only the appenders specified for security are kept.
Pattern for logging

<Pattern>%d %p %c{1.} [%t] %m%n</Pattern>
%d = data
%p = level (eg. ERROR)
%c = name of logger
%t = name of thread
%m = message
%n = new line
Modern Logging


Log as much as possible. Don't care about performance issue for logging
Dump all of that in a DB
Log structured data (ex: JSON)
Log timestamp, method name, request, payloads, data, message, latency,...
Aggregate those logs in a search engine (Ex: splunk)
Use a db like LogStash to search and also structure the log data
Normally cheaper to pay for a log search provider than to build one
Logger Name	Specific Appenders	Additivity flag	Active Appenders
root	A1	n/a	A1
x	A-x1, A-x2	true	A1, A-x1, A-x2
x.y	none	true	A1, A-x1, A-x2
x.yz	A-xyz1	true	A1, A-x1, A-x2, A-xyz1
security	S1	false	S1
security.access	S1-access	true	S1, S1-access