Skip to content

Instantly share code, notes, and snippets.

@Integralist
Last active August 2, 2018 02:50
Show Gist options
  • Save Integralist/22520358fced54b3fed5 to your computer and use it in GitHub Desktop.
Save Integralist/22520358fced54b3fed5 to your computer and use it in GitHub Desktop.
Microservices

In Summary:

⁃	Microservices are small autonomous services
⁃	Microservices are modeled around business concepts
⁃	Microservices encourage a culture of automation
⁃	Microservices should be highly observable
⁃	Microservices should hide implementation details
⁃	Microservices should isolate failure
⁃	Microservices should be deployed independently
⁃	Microservices should decentralise all the things

The long list...

⁃	Cohesion: group related code together
⁃	Gather together things that change for the same reason
⁃	Separate those things that change for different reasons
⁃	If behaviour is spread across services, then change in behaviour requires deploying updates to multiple services
⁃	Focus service boundaries where we can ensure related behaviour is located in one place
⁃	Microservices make it obvious where code lives for a given behaviour
⁃	Thus avoiding the problem of a service growing too large
⁃	Avoid structuring services around technical concepts, aim for business bounded contexts
⁃	Routing is a business requirement (I want to direct users to somewhere)
⁃	Page Composition is a business requirement (I want to put a page together for the user)
⁃	Source of data is a business requirement (I want a place where I can manage by config/templates)
⁃	Each microservice should be hosted on its own machine (don't pack services together in order to save cost)
⁃	Multiple micro services on one host means a failure of one impacts the other
⁃	This also means you're now unable to scale appropriately for the demands of any one microservice
⁃	Ensure services are evenly distributed across different regions and availability zones to improve resiliency
⁃	Utilise Load Balancers to help balance the incoming traffic (as well as SSL termination; as long as services are within a VPC)
⁃	Services need to change independently of each other
⁃	Services need to be loosely coupled (e.g. changed & deployed by themselves without requiring consumers to change)
⁃	Services should have a clear contract/interface
⁃	Services should try to be stateless and immutable (idempotent) as this requires much less complexity and facilitates easier scalability
⁃	Otherwise consuming services can become coupled to an internal representation
⁃	Choose technology agnostic APIs (e.g. REST over HTTP)
⁃	This means avoiding integration technology that dictates what technology stacks we can use to implement our microservice
⁃	Microservices allow choosing the right tool for the job
⁃	Microservices facilitate SPOF handling (offer a gracefully degraded service when part of the system fails)
⁃	Microservices allow us to align architecture with the organisation (focus on team ownership)
⁃	Microservices facilitates easy rewriting of services due to small size and well defined boundaries
⁃	Avoid shared libraries as they can restrict your ability to deploy easily/quickly
⁃	Don't let shared code leak outside your service boundary (otherwise this introduces a form of coupling)
⁃	You also lose technology heterogeneity with libraries (consumer needs to be the same language; e.g. Alephant)
⁃	Define good 'principles', followed by good 'practices' that support/guide those principles
⁃	Different teams with different technical 'practices' can then share a common 'principle'
⁃	It is essential that we can see a coherent, cross-service view of our system's health
⁃	This has to be system-wide, not service-specific
⁃	Inspecting service-specific health is useful only when diagnosing a wider problem
⁃	All services should have consistent mechanism for emitting health indicators/metrics as well as logging
⁃	Down/Upstream services should shield themselves accordingly from other unhealthy services
⁃	Provide templates (generators; e.g. CloudKit) that allow developers to follow best practices/architectural guidelines easily
⁃	The team who creates the templates shouldn't be gatekeepers, they should be open to accepting suggestions/changes
⁃	Avoid a centralised framework that does too much and affects developer productivity (rather than improve it)
⁃	Microservices allow greater ownership from multiple sources
⁃	Boundaries in code (e.g. think object-orientation) can result in becoming candidates for their own microservices
⁃	Services can be nested (in an abstraction sense) behind an encompassing service, but can depend on organisational structure
⁃	Good integration means simplicity. RPC may be good for performance but tightly couples our services with too much context
⁃	RPC exposes too much internal representation detail and should be avoided unless performance is absolutely critical
⁃	Always have interfaces/APIs in front of a data store (e.g. change from relational to nosql should not affect consumers)
⁃	Asynchronous communication is harder to co-ordinate but offers greater loose coupling (apposed to sync request/response)
⁃	RPC sometimes causes problems when devs aren't aware calls are 'remote' as appose to 'local' (affecting overall performance)
⁃	RPC typically isn't versioned and so you could implement a breaking change that requires 'lock-step releases' (i.e. coupling)
⁃	Collection and central aggregation of as much 'data' (e.g. logs/metrics) as we can get
⁃	We do this with logs going into Sumo Logic (I wish for something better than Sumo though)
⁃	We also do this with metrics going into CloudWatch and then out into Grafana (we can do better though)
⁃	Aim for consistency in the format for Metrics and Logs to enable the ability to easily filter them via a aggregation service
⁃	This is made easier via standardised tools (shared custom logging abstractions; e.g. Alephant Logger)
⁃	Being able to generate services with tools pre-baked in is useful, but you have to be careful about centralised authority stagnating progress
⁃	But we're still not doing this properly as far as tracing a call appropriately
⁃	Synthetic Monitoring (e.g. a synthetic transaction): a way to automate a fake request and store outcomes into a test bucket for analysis
⁃	Synthetic Monitoring can help identify when a service is unable to communicate with/to another service (but is otherwise healthy)
⁃	Make sure that synthetic testing system doesn't accidentally trigger unwanted 'side-effects' (less of an issue for us just displaying text content)
⁃	Correlation IDs: a poor man's "distributed tracing" (generate a unique guid and pass it along to all log calls)
⁃	Might be a clever way to expose a session guid to the logger (suggestion has been via HTTP headers)? 
⁃	Remember that the service needs to pass the header over to the next service as well (this is where a form of consistency - contract - is required)
⁃	This maybe a poor man's tracing but it would be supremely useful in tracking a single request from start to finish
⁃	Especially considering that most people find Zipkin to be a bit heavyweight 
⁃	Circuit Breakers help handling cascading service failures in a more elegant fashion
⁃	Aggregated network health status visibility system (e.g. my Heka hackday from 2015 or 2014) are recommended
⁃	Authentication inside a VPC perimeter can be made more efficient by terminating from the front door and using internal load balancers
⁃	Downside is if an attacker breaches your internal network then you stand no chance of preventing them reading your network traffic without HTTPS
⁃	But I'd argue if your VPC is compromised, you have much bigger issues
⁃	Implement network segregation (e.g. we do this already via VPC's, but have them on a more granular level; Morph & Mozart should be/are)
⁃	Whether the segregation is based on 'team ownership' or 'risk level' is up to your organisation to decide what's more appropriate
⁃	Tightly coupled organisations generally appear to produce tightly coupled software architecture by their natural influence
⁃	Similarly, loosely coupled organisations generally appear to produce very modular and loosely coupled software architecture
⁃	Having multiple teams trying to manage a code base makes it difficult to communicate, coordinate and to reason about the service
⁃	Distributed teams need to identify portions of a service that they can take ownership of and introduce clear service boundaries
⁃	The tendency for a single team that owns many services to lean towards tight coupling is more and more likely to occur
⁃	Team ownership of a service means they can do what they like as long as they don't break contracts/interfaces their consumers rely upon
⁃	Unless indicated via a versioning system
⁃	Having 'feature teams' also doesn't work as it means those teams cross over the responsibility boundaries
⁃	Internal 'open-source' (IOS) - let's face it: that's Alephant - can help avoid the need for 'feature teams' 
⁃	IOS uses the idea of core custodians but that other teams can help towards pushing a particular service functionality forward and avoid bottlenecking
⁃	Balance the need for complete automation of scaling against the service requirements (e.g. does a basic dashboard need 100% up time or not?)
⁃	Degrade your service functionality gracefully (as best you can to suit the requirements of your users/consumers)
⁃	Cascading failures are more likely to be caused by 'slow' responding services than failing ones (monitor and react accordingly)
⁃	Put timeouts on all 'out-of-process' calls to try and avoid slow services causing bottlenecks and knock-on effects
⁃	Circuit Breakers help defend your service against upstream services that are having problems
⁃	Plan for failure (e.g. Chaos Monkey).
⁃	Implement 'Bulkheads'. These are sections of your code that can be closed off to prevent sinking your entire application
⁃	Bulkheads are subtly different from Circuit Breakers (the former shuts down aspects of your own service; the latter is for upstream services)
⁃	Bulkheads aren't always logic based (e.g. if bad thing happens, disable feature X) they are also part of the software design process
⁃	e.g. the use of different connection pools for each upstream service; if one upstream is slow then only that one part of our service shuts down
⁃	Teasing apart functionality into microservices is another form of Bulkhead (failing of one microservice shouldn't affect another)
⁃	Timeouts and Circuit Breakers free up resources when they become constrained
⁃	Bulkheads ensure resources don't become constrained in the first place
⁃	Avoid designing a system where one service relies on another being up
⁃	e.g. Mozart Composition tries to solve that problem by serving from a page level cache if Morph is unavailable
⁃	This also means that much less coordination is needed between services (we become more loosely coupled)
⁃	Don't be afraid to start again and redesign (the beauty of microservices means a rebuild shouldn't be as costly as for a monolith) 
⁃	Identify your business model (reads vs writes) and aim to scale your services and resources appropriately
⁃	Implement caching at as many levels as is appropriate (HTTP, application, CDN etc)
⁃	You can even design your system in such a way that high bursts of 'writes' are cached and then flushed at a later stage ("write-behind cache")
⁃	Cached writes could be as simple as fire off the data to a queue to be processed asynchronously (depending on your business model)
⁃	Utilise AutoScaling and its variants (reactive, scheduled) more intelligently to suit your business needs
⁃	e.g. scale down services on a scheduled basis overnight if they're only utilised heavily during office hours (lunch time peak for a news orgs)
⁃	Understand CAP Theorem and what sacrifices (trade-offs) you can make that will best fit your business needs
⁃	Automate documentation wherever possible as this allows it to stay fresh (e.g. on code commit trigger documentation automation update)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment