Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
The patterns Copy-Replace, Stream-Split, and Join-Stream the last chapter are quite intense. Getting them to work on a running system is non-trivial. There will be times that they might need to be used however most of the time they can be avoided.
The patterns will work in any scenario. It does not matter whether it is a system that is doing 5000 transactions per second with TB of data or one doing seconds per transaction with 1 GB of data. They work in the same way.
There exist however "cheats" that can be used to avoid the use of the more complicated patterns. Depending on the scenario they may or may not apply, the trade offs associated with them may be better or worse than the complexity associated with the other patterns.
This chapter we will look at some of these "cheats" and where and when they might be applicable. Some are adding code complexity to avoid the need to versioning, others look at different deployment options. If possible you should generally prefer a pattern from this chapter unless you are in a situation where you cannot use it for some reason.
##Two Aggregates One Stream
One of the most underrated patterns to avoid the need to do a Stream-Split is to have two aggregates (states) being derived off of a single stream in the write model. It is unusual that such a simple concept is often overlooked by teams who instead decide to go for the far more complex method of a Stream-Split. To be fair the reason it gets overlooked is because many consider it to be hacky, it is breaking isolation, it is inproper. At the end of the day the job of a developer is to build working systems while mitgating risk. Code and architectural aesthetics lay a distant second.
There is *nothing* wrong with having two aggregates built off of the same stream.
Consider the previous example of the logistics company that has Engine currently modeled inside of Truck but then realize that engines are actually taken out of trucks and placed into other trucks, sometimes the truck is then discarded. This is a clear case of having two different life-cycles where a developer would really want to Stream-Split the Truck stream into a Truck stream and an Engine stream.
Another option is to not touch storage at all. Instead only change the application model above it. In the application model make Truck and Engine two separate things but allow them to be backed by the same stream. When loading an aggregate normally events it does not understand are just ignored. So when Truck is replayed Truck will listen to the events that it is interested in. When Engine is replayed it will receive the events that it cares about.
Both instances will be able to be rebuilt from storage, and both instances can write to the stream. Ideally they do not share any events in the stream but there are cases they do. Try to keep it that they do not share events as later it will be easier to do a Split-Stream operation if wanted if they do not.
When writing is is imperative that both use ExpectedVersion set to the last event they read. If they do not set ExpectedVersion then there could be the other instance writing in an event that would disallow the event that they are currently writing. This is a classic example of operating on stale data. If they both use ExpectedVersion however they will be assured the other instance has not updated the stream while a decision was made.
The setting of ExpectedVersion and having two aggregates can lead to more optimistic concurrency problems at runtime. If there is a lot of contention between the two aggregates, it may be best to Split-Stream. Under most circumstances though there is little contention either due to their separate life-cycles or due to a load overall system load.
Essentially what this pattern is doing is the exact same thing as a Split-Stream it is just doing it dynamically at the application level as opposed to doing it at the storage level.
##One Aggregate Two Streams
A similar process can be applied to the problem of Join-Streams. When loading the aggregate a join operation is used dynamically to combine two streams worth of events into a single stream that is then used for loading.
The join operation reads from both streams and orders events between them.
while(!stream1.IsEmpty() && !stream2.IsEmpty()) {
if(stream1.IsEmpty()) yield return stream2.Take();
if(stream2.IsEmpty()) yield return stream1.Take();
yield return Min(stream1,stream2);
This pseudocode will read from the two streams and provide a single stream with perfect ordering. It handles two streams but extrapolating this code to support N streams is not a difficult task. The aggregate or state is then built by consuming the single ordered stream generated.
The other side of the issue is handling the writing of an event. Which stream should it be written to? Most frameworks do not support the ability to specify a stream to write to, they are centered around stream per aggregate. If working without an existing framework or if you are writing your own it is trivial to handle. Instead of tracking Event track something that wraps Event and also contains the stream that the Event applies to. Under most circumstances it will be stream per aggregate but it supports the ability to have multiple streams backing an aggregate.
//TODO mention usage with private data!
Having multiple streams backing an aggregate is used much less often than multiple aggregates on the same stream. It feels more hacky and no frameworks that I know of support it. That said it can be a useful pattern when considered that the alternative would be to do a Join-Stream at the storage level.
Every pattern that has been looked at up until this point has focused on making a versioned change in the same environment. What if intead of working the same environment there were to be a new environment every time a change was made? This pattern has been heavily recommended in the great paper [The Dark Side of Event Sourcing: Data Management](
One nice aspect of Event Sourced systems is that they are centered around the concept of a log. An Event Store is a distributed log. Because all operations are appended to this log many things become simple such as replication. To get all of the data out of an Event Store all that is needed is to start at the beginning of the log and read to the end of the log. To get near real-time data after this point just continue to follow the log as writes are appended to it.
//TODO insert diagram
Figure 1 certainly seems much more complicated than the other patterns thus far. It is however in many ways simpler than them. Instead of migrating data in the Event Store for the new version of the application, the old system will be migrated to the new system. This migration process is the same process you would use if you were migrating a legacy system to a new system. Instead of versioning storage on each release treat each release as an entire new piece of software and migrate the old system to it.
A> A nice side benefit is if you treat every release as a migration, you should be pretty good at migrations when it is time to actually migrate off the system!
Most developers have done a migration at some point in their career. One of the great things about doing a migration is that you don't have to keep things the way those awful incompetent rascals before you did things; *even if they were you*. When doing a migration you can change however the old system worked into your new thing which is of course unicorns and rainbows, well until next year when you migrate off it again.
Instead of migrating system to system in Copy-Transform you migrate version to version. When doing this as in doing a Copy-Replace there is an opportunity to put a transformation in the process. Another way of thinking about Copy-Transform is Copy-Replace but on the entire Event Store not just on a stream.
In this transformation step, just like in Copy-Replace, anything can be done. Want to rename and event? Split one event into two? Join streams? The world is your oyster as you are writing a migration. The migration starts on the first event of the old system and goes forward until it is caught up. Only the old system is accepting writes so the migrated system can continue to follow it.
The new system not only has its own Event Store, it also has all of its own projections to read models hooked to it. It is an entire new system. As the data enters the Event Store it is then pushed out to the projections that update their read models. The Event Store may catch up before the projections. It is very important to monitor the projections so you know when they are caught up. Once they are caught up the system as a whole can do a [BigFlip](
To do a BigFlip the new system will tell the Event Store of the old system that it should stop accepting writes. Wait a short period of time. Then direct all traffic to the new system. This process generally takes only a few hundred milliseconds. After the new system has all load pointed to it and the old system is discardable.
//TODO insert diagram
Overall this is a nice way of handling an upgrade. It does however have a few limitations. If the dataset size is 50 GB the full Copy-Transform is not an issue. It might and hour but this is reasonable. I personally know of an EventStore instance in production that is 10TB, "We released this morning it should be ready in a week or two.". Another place it may not work well is when you have a scaled read side. If I have 100 geographically distributed read models it might be quite difficult to make this type of migration work seemlessly.
Another disadvantage
If you are not facing these limitations a Copy-Transform is a reasonable strategy. It removes most of the pain of trying to version a live running system by falling back to a BigFlip. Conceptually this is simpler that trying to upgrade a live running system. It is also a good option to consider
##Versioning Bankrupty
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.