applike-ss/blog.md Secret

## blog.md

      
    Raw
  

              blog.md
            
          
    Migrating from bare metal machines to AWS ECS

Motivation: Why was it needed, what are the advantages?

First of all i would like to tell you why it was needed.
Ideally every business grows over time. As this is the case for us, the requirements change.
At AppLike we have an increasing app install rate, so more users use our apps every day. Due to that, there is also an increased load on our server infrastructure and an increase in the amount of data we are handling.
So let's talk about advantages of AWS ECS:

It is flexible in how you want to use it
It makes scaling easier compared to bare metal, because it is container based. So spinning up new instances is basically the only thing needed to scale up.
Enables us to easily create new environments

The way from vagrant to docker

This one is really self explaining, we broke our “all-in-one” vagrant box into a service-based docker-compose.yml to conform to the docker standard as much as possible.

This image shows an abstract version of our docker-compose.yml, to illustrate how to structure your services for local development.

Haproxy could be considered unnecessary.
However, you would also not directly talk to the backend in a production environment - You would always have something in between like a load balancer.
Stack before the migration

This is how our stack looked like before we moved our application to AWS ECS.
One Bare metal node with web server, application code and a key/value store per api or just application code and a key/kalue store per worker node;
Several document storage nodes, several message broker nodes, a log transformer and some ElastiCache and RDS within the AWS Cloud.

“Bare Metal” to containerization

“Bare Metal”: EC2 is not really bare metal, since everything in EC2 is just virtualized either by paravirtual or hvm. But the scenario will still stay the same.
As you can see, we did keep most of the dependencies of our app as-is.
These are the changes we made:
We containerized the web server, application and key/value store into their own containers and made a ecs task definition out of them.
These are now used for creating api and worker tasks as well as ecs scheduled tasks.
However, due to them now being a ecs task, we can easily spawn more much faster than if we were spawning new ec2 instances with their custom ami that would on boot deploy itsself.
So whenever we run out of resources due to heavy load, we can now easily spin up some ec2 instances (even spot fleets to save some money) to add them to our ecs cluster and have them auto scale the api service.

Changes within the application and its configuration

When you upload container images anywhere, you obviously do not want to include your credentials there.
That would make them only work for you and just for that specific environment.
Having the credentials externally enables us to use the same images in several environments, even locally.
Using DotEnv enables us to use parameters that are either injected by the task definition as environment variables or that are set before starting the application.

Changes within the deployment strategy

Not only is it a bad practice to use automation tools to assemble a docker image, also this can make repeatable builds harder (given you are doing an open source project and want repeatable builds) and at the same time it can be confusing, because not all instructions used to build the container are within the Dockerfile.
While building the container image, we use a prebuilt image that already contains the base dependencies that are always needed, like a php-fpm including its configuration.
To get application parameters into our container we use aws-env which allows you to use the Parameter store of AWS Systems Manager to set environment variables. These parameters can be encrypted by the accounts KMS key to keep them secure. This is the easiest way to get your parameters into your application in a secure way.
There is another helpful tool called ecs-deploy which helps a lot when you want to deploy new versions of your code to AWS ECS. This tool also allows you to automatically rollback if the deploy was not successful (done within given timeout).
AWS ECS also includes rolling upgrades, so there is no need to have a tool do this manually.
Pitfalls and issues when starting to use AWS ECS

A very important aspect when deploying containers in a cluster is the placement strategy.
This is interesting especially for separating worker services from being placed on nodes that host your api.
Since an API should be as fast as possible and the workers do the heavy stuff, the workers would slow the api down due to cpu and i/o or even memory usage (swap usage).
Resource limits are another important topic.
When putting container services on a machine, you need to take into consideration that there are limited resources.
Not only CPU cycles, RAM and Disk space need to be taken into account, but also limits like the max open files ulimit of the EC2 instances docker daemon.
Don't expect to be able to use every resource to the max.
AWS ECS has many settings that you can tweak to your likings.
Using Minimum/Maximum Healthy percentage can be tricky when doing it the first time.
Mistakes are easy to make, if the desired count of the service is as low as 1 or 2.
If the count is 1, then you are required to either have the minimum percentage at 0 or the maximum percentage at 200 or greater, otherwise the service will never be deployable.
Take into account, that 0 will make your service unavailable for some time.
Another feature that you should always use are healthchecks.
When using health checks - and you should always use them in any layer where you find it applicable - your deployments will take longer.
You can at least partially solve this by setting the health check time to something low as 5 seconds instead of 30.
Depending on the startup time of your application (and maybe even cache generation which you should rather do when building the container), you might need to adjust the healthcheck settings.
Generally when creating images for docker you are advised to make them as small as possible.
If you do not, you might run into even longer deploy times due to containers being pulled.
When running cron tasks, we came across issues with how long the tasks were actually running.
We configured the tasks to run for 5 minutes and they ended up running 7 minutes.
Worked well on bare metal, didn't on AWS ECS.
Our crons are Symfony commands that are executed either by a crond within a worker the container or via a scheduled task from AWS ECS (depending on the task).
In the end we were able to solve this with some job based locking.
Image lifecycle is a topic that should not be forgotten.
Obviously you don't want a mess of a billion tags per repository, so you need something to clean up the mess.
AWS ECR supports lifecycle policies on a repository base, so you don't have to manually clean things up.
Easiest is to clean up the oldest tags that follow a specific schema, since you are less likely to need them anyway.
I am thinking about tags that have a common schema that may for example include a branch name.
Using the wrong filter or a lifecycle rule is quite easy when setting them up.
Surely, you want to define rules that keep your repository as clean as possible.
Luckily there is a testing tool that AWS provides withing the web console.
The next steps (for us)

To make it event easier to scale our application, we decided to tackle the microservice approach.
This will enable us to more rapidly try new technologies out without breaking bigger parts of our service.
Since we already started to look into golang for over a year now, we decided to try and gain some performance out of it.
In the end, this will possibly replace half of our php application in the future.
We are always looking out for new cloud services and technologies where adoption makes sense for us.