nerdfiles/Microservices.md

## Microservices.md

      
    Raw
  

              Microservices.md
            
          
    In Summary:

Microservices are small autonomous services
Microservices are modeled around business concepts
Microservices encourage a culture of automation
Microservices should be highly observable
Microservices should hide implementation details
Microservices should isolate failure
Microservices should be deployed independently
Microservices should decentralise all the things

The long list...

Cohesion: group related code together
Gather together things that change for the same reason
Separate those things that change for different reasons
If behaviour is spread across services, then change in behaviour requires deploying updates to multiple services
Focus service boundaries where we can ensure related behaviour is located in one place
Microservices make it obvious where code lives for a given behaviour
Thus avoiding the problem of a service growing too large
Avoid structuring services around technical concepts, aim for business bounded contexts
Routing is a business requirement (I want to direct users to somewhere)
Page Composition is a business requirement (I want to put a page together for the user)
Source of data is a business requirement (I want a place where I can manage by config/templates)
Each microservice should be hosted on its own machine (don't pack services together in order to save cost)
Multiple micro services on one host means a failure of one impacts the other
This also means you're now unable to scale appropriately for the demands of any one microservice
Ensure services are evenly distributed across different regions and availability zones to improve resiliency
Utilise Load Balancers to help balance the incoming traffic (as well as SSL termination; as long as services are within a VPC)
Services need to change independently of each other
Services need to be loosely coupled (e.g. changed & deployed by themselves without requiring consumers to change)
Services should have a clear contract/interface
Services should try to be stateless and immutable (idempotent) as this requires much less complexity and facilitates easier scalability
Otherwise consuming services can become coupled to an internal representation
Choose technology agnostic APIs (e.g. REST over HTTP)
This means avoiding integration technology that dictates what technology stacks we can use to implement our microservice
Microservices allow choosing the right tool for the job
Microservices facilitate SPOF handling (offer a gracefully degraded service when part of the system fails)
Microservices allow us to align architecture with the organisation (focus on team ownership)
Microservices facilitates easy rewriting of services due to small size and well defined boundaries
Avoid shared libraries as they can restrict your ability to deploy easily/quickly
Don't let shared code leak outside your service boundary (otherwise this introduces a form of coupling)
You also lose technology heterogeneity with libraries (consumer needs to be the same language; e.g. Alephant)
Define good 'principles', followed by good 'practices' that support/guide those principles
Different teams with different technical 'practices' can then share a common 'principle'
It is essential that we can see a coherent, cross-service view of our system's health
This has to be system-wide, not service-specific
Inspecting service-specific health is useful only when diagnosing a wider problem
All services should have consistent mechanism for emitting health indicators/metrics as well as logging
Down/Upstream services should shield themselves accordingly from other unhealthy services
Provide templates (generators; e.g. CloudKit) that allow developers to follow best practices/architectural guidelines easily
The team who creates the templates shouldn't be gatekeepers, they should be open to accepting suggestions/changes
Avoid a centralised framework that does too much and affects developer productivity (rather than improve it)
Microservices allow greater ownership from multiple sources
Boundaries in code (e.g. think object-orientation) can result in becoming candidates for their own microservices
Services can be nested (in an abstraction sense) behind an encompassing service, but can depend on organisational structure
Good integration means simplicity. RPC may be good for performance but tightly couples our services with too much context
RPC exposes too much internal representation detail and should be avoided unless performance is absolutely critical
Always have interfaces/APIs in front of a data store (e.g. change from relational to nosql should not affect consumers)
Asynchronous communication is harder to co-ordinate but offers greater loose coupling (apposed to sync request/response)
RPC sometimes causes problems when devs aren't aware calls are 'remote' as appose to 'local' (affecting overall performance)
RPC typically isn't versioned and so you could implement a breaking change that requires 'lock-step releases' (i.e. coupling)
Collection and central aggregation of as much 'data' (e.g. logs/metrics) as we can get
We do this with logs going into Sumo Logic (I wish for something better than Sumo though)
We also do this with metrics going into CloudWatch and then out into Grafana (we can do better though)
Aim for consistency in the format for Metrics and Logs to enable the ability to easily filter them via a aggregation service
This is made easier via standardised tools (shared custom logging abstractions; e.g. Alephant Logger)
Being able to generate services with tools pre-baked in is useful, but you have to be careful about centralised authority stagnating progress
But we're still not doing this properly as far as tracing a call appropriately
Synthetic Monitoring (e.g. a synthetic transaction): a way to automate a fake request and store outcomes into a test bucket for analysis
Synthetic Monitoring can help identify when a service is unable to communicate with/to another service (but is otherwise healthy)
Make sure that synthetic testing system doesn't accidentally trigger unwanted 'side-effects' (less of an issue for us just displaying text content)
Correlation IDs: a poor man's "distributed tracing" (generate a unique guid and pass it along to all log calls)
Might be a clever way to expose a session guid to the logger (suggestion has been via HTTP headers)?
Remember that the service needs to pass the header over to the next service as well (this is where a form of consistency - contract - is required)
This maybe a poor man's tracing but it would be supremely useful in tracking a single request from start to finish
Especially considering that most people find Zipkin to be a bit heavyweight
Circuit Breakers help handling cascading service failures in a more elegant fashion
Aggregated network health status visibility system (e.g. my Heka hackday from 2015 or 2014) are recommended
Authentication inside a VPC perimeter can be made more efficient by terminating from the front door and using internal load balancers
Downside is if an attacker breaches your internal network then you stand no chance of preventing them reading your network traffic without HTTPS
But I'd argue if your VPC is compromised, you have much bigger issues
Implement network segregation (e.g. we do this already via VPC's, but have them on a more granular level; Morph & Mozart should be/are)
Whether the segregation is based on 'team ownership' or 'risk level' is up to your organisation to decide what's more appropriate
Tightly coupled organisations generally appear to produce tightly coupled software architecture by their natural influence
Similarly, loosely coupled organisations generally appear to produce very modular and loosely coupled software architecture
Having multiple teams trying to manage a code base makes it difficult to communicate, coordinate and to reason about the service
Distributed teams need to identify portions of a service that they can take ownership of and introduce clear service boundaries
The tendency for a single team that owns many services to lean towards tight coupling is more and more likely to occur
Team ownership of a service means they can do what they like as long as they don't break contracts/interfaces their consumers rely upon
Unless indicated via a versioning system
Having 'feature teams' also doesn't work as it means those teams cross over the responsibility boundaries
Internal 'open-source' (IOS) - let's face it: that's Alephant - can help avoid the need for 'feature teams'
IOS uses the idea of core custodians but that other teams can help towards pushing a particular service functionality forward and avoid bottlenecking
Balance the need for complete automation of scaling against the service requirements (e.g. does a basic dashboard need 100% up time or not?)
Degrade your service functionality gracefully (as best you can to suit the requirements of your users/consumers)
Cascading failures are more likely to be caused by 'slow' responding services than failing ones (monitor and react accordingly)
Put timeouts on all 'out-of-process' calls to try and avoid slow services causing bottlenecks and knock-on effects
Circuit Breakers help defend your service against upstream services that are having problems
Plan for failure (e.g. Chaos Monkey).
Implement 'Bulkheads'. These are sections of your code that can be closed off to prevent sinking your entire application
Bulkheads are subtly different from Circuit Breakers (the former shuts down aspects of your own service; the latter is for upstream services)
Bulkheads aren't always logic based (e.g. if bad thing happens, disable feature X) they are also part of the software design process
e.g. the use of different connection pools for each upstream service; if one upstream is slow then only that one part of our service shuts down
Teasing apart functionality into microservices is another form of Bulkhead (failing of one microservice shouldn't affect another)
Timeouts and Circuit Breakers free up resources when they become constrained
Bulkheads ensure resources don't become constrained in the first place
Avoid designing a system where one service relies on another being up
e.g. Mozart Composition tries to solve that problem by serving from a page level cache if Morph is unavailable
This also means that much less coordination is needed between services (we become more loosely coupled)
Don't be afraid to start again and redesign (the beauty of microservices means a rebuild shouldn't be as costly as for a monolith)
Identify your business model (reads vs writes) and aim to scale your services and resources appropriately
Implement caching at as many levels as is appropriate (HTTP, application, CDN etc)
You can even design your system in such a way that high bursts of 'writes' are cached and then flushed at a later stage ("write-behind cache")
Cached writes could be as simple as fire off the data to a queue to be processed asynchronously (depending on your business model)
Utilise AutoScaling and its variants (reactive, scheduled) more intelligently to suit your business needs
e.g. scale down services on a scheduled basis overnight if they're only utilised heavily during office hours (lunch time peak for a news orgs)
Understand CAP Theorem and what sacrifices (trade-offs) you can make that will best fit your business needs
Automate documentation wherever possible as this allows it to stay fresh (e.g. on code commit trigger documentation automation update)