hiranp/From monolith to microservices.md

## From monolith to microservices.md

      
    Raw
  

              From monolith to microservices.md
            
          
    Rodrigue Schaefer: From monolith to microservices about some of the challenges of Zalando's transition from monolith to microservices (microXchg 2016)

Zalando history


2008 started with a POC with magento => starts fine but does not scale very well


2010 couldn't handle the raising load and traffic with magento


=> so in 3 months they build their own system, based on Java, Spring, Postgres DB (a monolithic application)


Their main focus was to build a system as efficient, stable and fast as possible

=> business logic in db (store procedures)
=> all access via store procedures (no direct access to db)

5 years later (~2015): The Monolithic architecture

The system eventually become hard to handle (maintainability): adding a new feature was getting harder and harder
Negative effects on productivity


huge number of devs working on the same codebase
lot of dependencies between teams => a lot of coordination needed between teams for developing and releasing a feature (=> "release train" model)
extra amount of coordination => slow productivity

Negative effects on innovation


as code size increases

=> bug density increases and
=> system complexity increases


higher complexity ===> rigid processes were adopted (everything tightly controlled to reduce variance as much as possible) ==> this kills innovation (everything on the same tech stack, etc)


Negative effects on growth


Old platform with rigid processes

=> hiring problems, retention issues: difficult with finding people wanting to work on this old platform
=> not attractive to young talents
=> slow onboarding and fear to change anything


Then came their radical organizational change!
Radical Agility (march 2016)

Zalando wanted:

Autonomous teams to deliver amazing products efficiently at scale
Give a team independence (no time lost with fear of breaking someone else' part)

Based on 3 principles, glued together by Trust:

Autonomy: the team can act on its own, define its delivery process, its technological stack, have an idea, develop it, deploying and operate it
Purpose: strategical alignment to be all on the same page

Alignment is made with OKR: the company define its objective for the year, then each department take these objectives and come up with their own, then the teams do the same


Mastery: give support to engineers to get better at what they want to be good at. We want excellent engineers, so we need to help them develop and grow.

=> Positive psychology is helping here. Old psychology: helping sick people. Positive psychology: make normal people more happy (see
Six factor Model of Psychological Well being


Conway's law applied in reverse: we changed the organization and now suddenly the old technological landscape did not fit to this change.
Organization side: "a purpose-driven organization composed of autonomous teams which deliver clearly defined products"
this, mapped to the technological side, means:
Technological side: "a service-oriented architecture composed of loosely coupled elements that have bounded contexts" (A.Cockcroft definition of microservices)
Organization side + Technological side => Radical Agility
Prerequisites for Microservices


Rapid provisioning
Basic Monitoring
Rapid application deployment

See also MicroservicePrerequisites by M.Fowler
To adopt a microservice architecture, you need to be very good at operation, because when you migrate a monolith to a microservice architecture you push the complexity down at the infrastructure level.
AWS + docker + app-monitoring + Stups.io (open-source platform developed by Zalando).
PS Stups is now on-hold in Zalando.
The Mindset


Expect failure: expect other systems to fail, so:

build resilient systems
avoid domino effects (using tools like Hystrix)
something the most engineers doesn't know or are not used to
we help all teams to have this mindset (even one team without this mindset could "ruin" all the system)


End to end responsibility

cross functional teams responsible of everything: dev, test, operations
teams should think as a small startup: nobody cares about your staff, you have to care about your idea


Software as a service

teams have to see their products as "software as a service", and see other teams are their customers.
a great mindset change for a lot of dev people


API first

try to keep aligned all teams on how to design and share API
there's an "API gild" which reviews APIs made by teams and help creating coherent APIs across all the organization org


Global Architecture

With ~70 teams => how do you make sure everything fits well together?!
In order to handle this, Zalando has:

Rules of Play - defines a vision of the architecture we want to have: loosely coupled services, resilience, REST as the main style to design API, ... => written down into a booklet and given to everybody
Peer Reviews: get feedback and opinions by others
Tech Radars: looking at new technologies, technologies we don't want to see, experimental stuff, etc. Share this knowledge publicly
Shared Concept of Core Business Entities (aka "prototype architecture"): we took some of the best engineers to work together to create a blueprint of how the new microservices platform could look like (they created a prototype of the core domain functionality in a dozen of microservices, message queues,...). This is no ivory tower architecture, just an idea presented to the teams to take what they found useful and go from there.

Compliance and Security

Zalando is a public company: auditors should be happy :D
What Zalando does:

4-eyes principles
Audit trails
Identity and access management (a big project, they developed an internal tool to handle this, based on oauth)
Data protection agreement

Case Study: Fashion Store

Many examples of microservices architecture are basically back-end platforms. What about the presentation?

How do you split the single page of a frontend application in parts?
How do you enable independent teams to work on different parts of the page?

Zalando has 12 teams on the fashion shop: each team owns a different "fragment" (a component or widget on the page, e.g. the recommendation box).
Each fragment is a markup served by a specific webapp operated by a team.
Each webapp sits on top of a set of collaborating microservices.
The "Layout Service" put all this "puzzle pieces" together, using a template (based on the URL of the page) and a context (e.g. the current user).
A Router sits in front of the layout service, used to proxy the requests coming from the users to the new and to the old (monolithic) shop.
This way Zalando can still use "legacy functionalities" served by the monolithic application, and migrate smoothly and incrementally to a new microservice architecture.
They started migrating the checkout.
With this router they can do A/B testing or canary releases.
The routes can be changed dynamically via API, at runtime!
Open-sourced many of these tools (see https://opensource.zalando.com/):

Skipper: the router
Inn Keeper: the routes storage
Tailor: the layout service
Quilt: the templates storage

How all these impact on innovation, productivity and growth

Innovation


able to inject new features (at runtime!)
faster feedback loop: able to try out new ideas without waiting for the next "release train" to come...
tech agnostic => can try out new technologies

Productivity


autonomous teams with full control, they can define their processes tailored to their needs
true lean and agile processes
continuous delivery (not all teams are doing this, each one can decide)
teams are now independent each other

Growth


smaller codebases => faster onboarding of new members
up-to-date technologies attract young talents
easier to spin-off new teams

It's a long long journey, which will take years!
Q&A


Q: How do you manage the balance between team independence and teams overlapping on the same things, maybe solving the same problems (and maybe with different technologies)?
A: It happens sometimes. To avoid it, in each team there's a "delivery lead" role, sort of a coach for the team; he/she also has a bigger picture of what the company is trying to achieve, can connect teams, and check that communication and alignment are in place. But it happens!


Q: What issues with the IAM solution?
A: The current challenge for our IAM is scaling to handle all the requests. We have lots of requests from the customers:

=> which means lots of calls on the layout service
=> which means lots of calls to endpoints
=> which means lots of calls to microservices
=> which means each service has to authenticate via IAM
scaling is really hard


Frontend development issues

In the backend teams are really autonomous in choosing whatever they like, because each service is independent.
The problem is in the frontend, where each team could start using new fancy js frameworks:

this ends up in the user's browser
slows down the page load time
may create conflicts and incompatibilities between adopted frameworks

=> we came up with a list of allowed frameworks (together with all the engineers)
=> we use the tech radar to share knowledge and create alignment between teams

Checkout: they keep both versions deployed and run in parallel. Used the old version as fallback in case of issues with the new version.

Convert vs rewrite of functionality as microservices: they chose to rewrite because converting the functionality would have been too expensive

Pushing on open-source => to let zalando be seen as a tech company

Size of teams? They follow Amazon's "two pizzas rule" => size between 2 and 12 people, most teams are around 6 people

Common shared libraries between services? We don't use shared library. the only exception is when a team publishes the lib as open source

Who handle support? 1st-level support team to handle an incident
=> but they're moving to teams being responsible for their own systems

How to handle performance testing on the platform and security related stuff?
Business-assurance unit, which coach teams on how to do performance testing.
So there is a horizontal unit doing this.

When the team is getting too big?
If your team is getting too big, chances are that

your service is too big
you have too many services

it's a good idea to split the team or split up the service

Team owning old functionality is the same responsible for its migration?
Same team responsible for the old monolithic functionality is responsible for rebuilding the functionality as microservices.

Mobile apps?
The mobile apps currently still use the API exposed by the monolithic platform