Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save hiranp/8303bbba1480dfe3d0ea032b44c86e66 to your computer and use it in GitHub Desktop.
Save hiranp/8303bbba1480dfe3d0ea032b44c86e66 to your computer and use it in GitHub Desktop.
My notes on Rodrigue Schaefer's talk "From monolith to microservices" at microXchg 2016

Rodrigue Schaefer: From monolith to microservices about some of the challenges of Zalando's transition from monolith to microservices (microXchg 2016)

Zalando history

  • 2008 started with a POC with magento => starts fine but does not scale very well

  • 2010 couldn't handle the raising load and traffic with magento

  • => so in 3 months they build their own system, based on Java, Spring, Postgres DB (a monolithic application)

Their main focus was to build a system as efficient, stable and fast as possible

  • => business logic in db (store procedures)
  • => all access via store procedures (no direct access to db)

5 years later (~2015): The Monolithic architecture

The system eventually become hard to handle (maintainability): adding a new feature was getting harder and harder

Negative effects on productivity

  • huge number of devs working on the same codebase
  • lot of dependencies between teams => a lot of coordination needed between teams for developing and releasing a feature (=> "release train" model)
  • extra amount of coordination => slow productivity

Negative effects on innovation

  • as code size increases

    • => bug density increases and
    • => system complexity increases
  • higher complexity ===> rigid processes were adopted (everything tightly controlled to reduce variance as much as possible) ==> this kills innovation (everything on the same tech stack, etc)

Negative effects on growth

  • Old platform with rigid processes
    • => hiring problems, retention issues: difficult with finding people wanting to work on this old platform
    • => not attractive to young talents
    • => slow onboarding and fear to change anything

Then came their radical organizational change!

Radical Agility (march 2016)

Zalando wanted:

  • Autonomous teams to deliver amazing products efficiently at scale
  • Give a team independence (no time lost with fear of breaking someone else' part)

Based on 3 principles, glued together by Trust:

  • Autonomy: the team can act on its own, define its delivery process, its technological stack, have an idea, develop it, deploying and operate it
  • Purpose: strategical alignment to be all on the same page
    • Alignment is made with OKR: the company define its objective for the year, then each department take these objectives and come up with their own, then the teams do the same
  • Mastery: give support to engineers to get better at what they want to be good at. We want excellent engineers, so we need to help them develop and grow.

Conway's law applied in reverse: we changed the organization and now suddenly the old technological landscape did not fit to this change.

Organization side: "a purpose-driven organization composed of autonomous teams which deliver clearly defined products"

this, mapped to the technological side, means:

Technological side: "a service-oriented architecture composed of loosely coupled elements that have bounded contexts" (A.Cockcroft definition of microservices)

Organization side + Technological side => Radical Agility

Prerequisites for Microservices

  • Rapid provisioning
  • Basic Monitoring
  • Rapid application deployment

See also MicroservicePrerequisites by M.Fowler

To adopt a microservice architecture, you need to be very good at operation, because when you migrate a monolith to a microservice architecture you push the complexity down at the infrastructure level.

AWS + docker + app-monitoring + Stups.io (open-source platform developed by Zalando). PS Stups is now on-hold in Zalando.

The Mindset

  • Expect failure: expect other systems to fail, so:

    • build resilient systems
    • avoid domino effects (using tools like Hystrix)
    • something the most engineers doesn't know or are not used to
    • we help all teams to have this mindset (even one team without this mindset could "ruin" all the system)
  • End to end responsibility

    • cross functional teams responsible of everything: dev, test, operations
    • teams should think as a small startup: nobody cares about your staff, you have to care about your idea
  • Software as a service

    • teams have to see their products as "software as a service", and see other teams are their customers.
    • a great mindset change for a lot of dev people
  • API first

    • try to keep aligned all teams on how to design and share API
    • there's an "API gild" which reviews APIs made by teams and help creating coherent APIs across all the organization org

Global Architecture

With ~70 teams => how do you make sure everything fits well together?!

In order to handle this, Zalando has:

  1. Rules of Play - defines a vision of the architecture we want to have: loosely coupled services, resilience, REST as the main style to design API, ... => written down into a booklet and given to everybody
  2. Peer Reviews: get feedback and opinions by others
  3. Tech Radars: looking at new technologies, technologies we don't want to see, experimental stuff, etc. Share this knowledge publicly
  4. Shared Concept of Core Business Entities (aka "prototype architecture"): we took some of the best engineers to work together to create a blueprint of how the new microservices platform could look like (they created a prototype of the core domain functionality in a dozen of microservices, message queues,...). This is no ivory tower architecture, just an idea presented to the teams to take what they found useful and go from there.

Compliance and Security

Zalando is a public company: auditors should be happy :D

What Zalando does:

  1. 4-eyes principles
  2. Audit trails
  3. Identity and access management (a big project, they developed an internal tool to handle this, based on oauth)
  4. Data protection agreement

Case Study: Fashion Store

Many examples of microservices architecture are basically back-end platforms. What about the presentation?

  • How do you split the single page of a frontend application in parts?
  • How do you enable independent teams to work on different parts of the page?

Zalando has 12 teams on the fashion shop: each team owns a different "fragment" (a component or widget on the page, e.g. the recommendation box). Each fragment is a markup served by a specific webapp operated by a team. Each webapp sits on top of a set of collaborating microservices. The "Layout Service" put all this "puzzle pieces" together, using a template (based on the URL of the page) and a context (e.g. the current user).

A Router sits in front of the layout service, used to proxy the requests coming from the users to the new and to the old (monolithic) shop. This way Zalando can still use "legacy functionalities" served by the monolithic application, and migrate smoothly and incrementally to a new microservice architecture. They started migrating the checkout. With this router they can do A/B testing or canary releases. The routes can be changed dynamically via API, at runtime!

Open-sourced many of these tools (see https://opensource.zalando.com/):

  • Skipper: the router
  • Inn Keeper: the routes storage
  • Tailor: the layout service
  • Quilt: the templates storage

How all these impact on innovation, productivity and growth

Innovation

  • able to inject new features (at runtime!)
  • faster feedback loop: able to try out new ideas without waiting for the next "release train" to come...
  • tech agnostic => can try out new technologies

Productivity

  • autonomous teams with full control, they can define their processes tailored to their needs
  • true lean and agile processes
  • continuous delivery (not all teams are doing this, each one can decide)
  • teams are now independent each other

Growth

  • smaller codebases => faster onboarding of new members
  • up-to-date technologies attract young talents
  • easier to spin-off new teams

It's a long long journey, which will take years!

Q&A

  • Q: How do you manage the balance between team independence and teams overlapping on the same things, maybe solving the same problems (and maybe with different technologies)?
  • A: It happens sometimes. To avoid it, in each team there's a "delivery lead" role, sort of a coach for the team; he/she also has a bigger picture of what the company is trying to achieve, can connect teams, and check that communication and alignment are in place. But it happens!

  • Q: What issues with the IAM solution?
  • A: The current challenge for our IAM is scaling to handle all the requests. We have lots of requests from the customers:
    • => which means lots of calls on the layout service
    • => which means lots of calls to endpoints
    • => which means lots of calls to microservices
    • => which means each service has to authenticate via IAM
    • scaling is really hard

Frontend development issues

In the backend teams are really autonomous in choosing whatever they like, because each service is independent. The problem is in the frontend, where each team could start using new fancy js frameworks:

  • this ends up in the user's browser
  • slows down the page load time
  • may create conflicts and incompatibilities between adopted frameworks

=> we came up with a list of allowed frameworks (together with all the engineers) => we use the tech radar to share knowledge and create alignment between teams


Checkout: they keep both versions deployed and run in parallel. Used the old version as fallback in case of issues with the new version.


Convert vs rewrite of functionality as microservices: they chose to rewrite because converting the functionality would have been too expensive


Pushing on open-source => to let zalando be seen as a tech company


Size of teams? They follow Amazon's "two pizzas rule" => size between 2 and 12 people, most teams are around 6 people


Common shared libraries between services? We don't use shared library. the only exception is when a team publishes the lib as open source


Who handle support? 1st-level support team to handle an incident => but they're moving to teams being responsible for their own systems


How to handle performance testing on the platform and security related stuff?

Business-assurance unit, which coach teams on how to do performance testing. So there is a horizontal unit doing this.


When the team is getting too big?

If your team is getting too big, chances are that

  • your service is too big
  • you have too many services

it's a good idea to split the team or split up the service


Team owning old functionality is the same responsible for its migration?

Same team responsible for the old monolithic functionality is responsible for rebuilding the functionality as microservices.


Mobile apps?

The mobile apps currently still use the API exposed by the monolithic platform

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment