Integralist/1. Designing Systems and Applications.md

## 1. Designing Systems and Applications.md

      
    Raw
  

              1. Designing Systems and Applications.md
            
          
    Designing Systems and Applications

This is a short document of tips and notes I've accumulated while learning more about designing distributed systems and building concurrent applications. It is by no means definitive and merely scratches the surface of what is needed to be considered when designing an architecture expected to handle large scale traffic.
Distributed Systems

Scale out, not up

There reaches a point in your application's design where by merely throwing more hardware at the problem (i.e. "scaling up") will fail to resolve the scalability issues you're encountering.
You should aim for an system designed to scale horizontally (i.e. "scaling out") as it allows for easier growth and improvement.
Build smaller applications

If you currently have a monolithic application, then you should consider drawing a top level diagram of your current architecture. Find logical areas of the application which can be split into separate services that communicate with each other. This allows each individual service to scale separately depending on its load.

Note: mechanisms for communication could be pubsub (e.g. AWS SNS)

or via a push/pull design using queues (e.g. AWS SQS)

and use temporary storage to pass between services (e.g. AWS S3)

The discussion of designing modular systems and applications (which are connected via different mechanisms to potentially avoid single points of failure) will always cause some contention between the arguments of "pure scalability" & "economical financial stability"; e.g. smaller set of servers sharing multiple services is cheaper than a individual servers for each service. The outcome will depend on the size of the organisation and whether it's financially viable to support an extremely granular architecture.
Measure, Monitor, Log

Implement solutions to application logging and server monitoring as soon as possible. Aim to prevent disasters rather than just automating recovery from them. Identify unhealthy services before they become an unrecoverable problem.
Also, by measuring and monitoring appropriately, we can ascertain whether our instances are too small/large and can be scaled up/down accordingly.
Utilization

Don't just think about automatically scaling; you should definitely scale where appropriate, but also think about optimizing utilization. For example, look at your application/service design and look at ways to modify them so they can handle more traffic, more efficiently. Doing this will help to reduce costs by not needing to scale as much as we'll be better utilizing the resources.
Single Points of Failure (SPOF)

You should look to identify weak spots (e.g. "single points of failure") in your architecture. These are places where by if any part of your application or service goes down, then the whole application becomes useless (see next section on failing gracefully). See the architecture diagrams at the bottom of this page which demonstrate a simple application which had a SPOF but re-architectured the applications and services so they could better handle potential failures and keep users unaware there was a problem with the system.
Fail Gracefully

Certain components can't scale easily (such as databases & nosql document stores). In an ideal world we would build the application to fail gracefully. For example, we could monitor/watch for "warning" thresholds; and when critical mass is reached and the relevant alarm is fired (let's say an AWS SNS notification is sent) our applications (if subscribed to the SNS topic) would be able to take action to store off important data in an external service (in case of imminent failure) or worst case temporarily disabling certain features so the user doesn't lose important data.
How we fail should be determined by us and our application and will be dependent on the application being built (i.e. you couldn't fail in the exact same way for every application type as the requirements would be vastly different).
Caching

Analyse your architecture not only for SPOF but also to see where we can implement caching layers to reduce traffic load.
Concurrency

Concurrency introduces many different types of problems. As an example, think of the classic "Banker's Dilemma" where by you have two customers: John and Bob who both have access to a joint account, which currently holds £10. If both John and Bob trigger an action at the same time (let's say John takes out £10 and Bob takes out £5) then what should be the outcome? If John's action was handled first then the account balance should be 0, but if Bob's action was triggered at the same time but didn't finish as quickly then we have an issue where he'll be attempting to take £5 from a 0 balance account.
Thread Safety

Although a much larger discussion by itself, the principle of data being "thread safe" is that it is accessible via multiple threads at the same time and will not cause conflict. Usually have globally shared state can help, or better yet, writing applications that work with the idea of immutable data (where by a copy of the modified data is used and the original is left untouched - see languages such as Haskell and Clojure).

Note: for more information on thread safety, have a read of

http://en.wikipedia.org/wiki/Thread_safety

Eventual Consistency

The principle of "eventual consistency" is implemented in distributed systems as a way to ensure that a data change will "eventually" be applied to all available nodes (i.e. it wont be immediate; as that is the tradeoff with the principle of "high availability").
If you want a change to be made immediately across multiple nodes then those nodes would need to be locked down long enough for the change to be replicated throughout. This could be a lengthy process and so the eventual consistency model was designed to keep availability high and allow for systems to keep running until the change had been safely applied throughout all available nodes.
CPU vs I/O

A CPU/Processor can contain one or more cores. For example, a quad core processor that runs at speed of 3GHz will have 4 cores running at that speed.
I/O, whether a file system action or Network - e.g. HTTP - action, can block and so if the application is designed to work concurrently (e.g. there are other threads the CPU can jump to in the mean time) then the current thread will be left to finish and another thread will be picked up instead (this is how concurrency works - the CPU interleaves between threads - this should also clarify how concurrency is not the same thing as paralleism).
For computational intensive operations you'll want the number of threads to be equal to the number of cores available.
For I/O intensive operations you'll want more threads than available cores. This is because (as explained above) the CPU/Processor will "context switch" to another thread when the current thread is blocked (hence it is better to have more threads than cores for I/O).
Calculating The Number of Threads

To calculate how many more threads than cores you'll need for an intensive set of I/O operations, use the following algorithm:
Number of Threads = Number of Available Cores / (1 - Blocking Coefficient)

Note: the blocking coefficient (coefficient being a fancy word that means: a value used as a multiplier) is different depending on the operation. For a computational operation it is 0, where as a fully blocking operation it is 1.

An example of a blocking coefficient would be: 0.9 - which means a task blocks 90% (0.9) of the time & works only 10% (0.1) of the time. Meaning, if you has 2 cores then you'd want 20 threads.
2 / (1 - 0.9) = 20
Thread Pools

A thread pool is a collection of pre-determined threads that automatically handles the management of tasks from a queue. Thread pools can sometimes be more efficient (and practical) than manually maintaining individual threads via your own application. Languages such as Java (and indirectly JRuby) has built-in support for thread pools.

Even Workload Distribution

If you have two cores and a very large queue of messages to process, then your initial thought would maybe be to split the queue (i.e. the tasks) into two. This would mean you could have two threads running (i.e. utilising both cores); the first thread processing the first queue data and the second thread handling the other half of the queue data.
The problem with this solution is that is doesn't necessarily guarantee even distribution of the tasks across your available cores. If our queue data consisted of a computational task such as calculating prime numbers then the first half of the queue would take a lot less time to process because the smaller prime numbers would take less time to calculate than the other queue (which if evenly split in two would mean the other queue would have the much larger prime numbers to calculate).
This means one core will be sitting idle while the other core is still processing data.
What would be better is to have more parts than threads/cores. So if one "part" finishes more quickly than expected, then another part can be picked up. Simply diving our tasks into two parts means one core will likely be sitting idle for longer than the other core. But if we divide our tasks into more granular parts, then we can aim to utilise as much of each core as possible.
AWS

Understanding AutoScaling

ASG contains instances and so requires a Launch Configuration
Launch Configurations determine which instance is launched (AMI & Instance Type)
Alarms are from CloudWatch and they determine when a scaling action (i.e. policy) should take place
Policies specify that instances should be launched or terminated
Scale up/down amount should be a multiple of the number availability zones (so ELB can evenly distribute)
Steps: create launch config (specify AMI & instance type); create ASG (pass in launch config, availability zones, main/max instances, load balancers); create policies (to represent scaling actions); create alarms to trigger policies
Calculating Costs

Use the official AWS Simple Monthly Calculator to estimate costs of all services used. This is to help communicate to the business that we're aware of cost concerns and are designing our applications to be as cost effective as possible.
Prevention AND Reaction

As mentioned earlier, we don't want to just react to issues, but be prepared by proactively monitoring our services and utilizing resources such as CloudWatch alarms as a way to indicate preventative thresholds be crossed.
We should be looking for sustained high usage and not solely worrying about specific spikes/peaks in traffic. High usage could be an indicator of a potential optimisation hot spot.
We should also be considerate of high profile events in our industry. With this knowledge we could indicate a potential to pre-warm our servers rather than just reacting to sudden high traffic loads.
Analyse Groups

We should try to utilise CloudWatch within the AWS console to analyse multiple servers at once. In doing this we can then monitor AutoScaling groups to ensure the group, as a whole, is healthy - rather than simply analysing smaller instances and not getting a view of the bigger picture.
Depending on the system design and workflow of your application, the throughput on a queue should give us a correlating throughput in subsequent queues. For example, you might have a queue that takes in messages for orders to be processed, then once processed the successful order details are sent to a subsequent queue which are processed and emailed to the relevant customers. The "processing" queue & "successful orders" queue should have similar throughput - giving us a useful indication of a healthy overall system (i.e. we're not just processing orders and failing to process the emails to customers).

  
## Designing Systems.md

      
    Raw
  

              Designing Systems.md
            
          
    Images taken from http://www.slideshare.net/IanMassingham/aws-devops-event-aws-services-enabling-devops-automated-testing-monitoring-and-continuous-deployment
Bad Application Design


Good Application Design