Complete end-to-end guide for developing dockerized lambda in Typescript, Terraform and SAM CLI
This is a full guide to locally develop and deploy a backend app with a recently released container image feature for lambda on AWS.
Needless to say, if you are a great fan of Docker, you would know how amazing it is. What you test on local is what you get when you deploy it, at least at the container level.
Since this feature is quite new, there have been lots of rabbit holes I fell into, and I'm sure others will too, so I will break every piece of rabbit hole down so that nobody gets trapped into it. This guide starts from the real basics like making a user or setting up terraform, so feel free to skip to the section you need.
Reason to use Terraform and SAM CLI together
Well, it seems that Terraform supports building a Docker image and deploying it to ECR out of the box, but after lots of digging, I noticed that things would get simpler if I just build docker image in another pipeline and deploy it with a few lines of shell script. So Terraform will used to define resources excluding the build and deployment process. There's no problem with that.
And, what SAM CLI? Terraform cannot replace SAM CLI and vice versa. SAM cli is useful in developing local lambdas because it automatically configures endpoints for each lambda and greatly removes barriers to the initial setup. Since lambda functions are 'special' in the way that they only get 'booted up and called' when they are invoked (unlike EC2 or Fargate), just doing
ts-node my-lambda.ts would not make it work. Of course there are many other solutions to this matter (such as
sls) but in this guide I will just use SAM CLI. But for many reasons SAM makes me want to use other better solutions if any... The reason follows right below.
nodemon to watch a certain directory to trigger
sam build every single time, and in another shell launch
sam local start-api. It works as expected, but the current problem I see from here is that every single time it
sam builds, it would make another Docker image and another and so on, so there will be many useless dangling images stacked up in your drive, which you will need to delete manually because SAM CLI does not support passing in a argument that's equivalent to
docker run --rm. Anyways that's the story, so this is the reason I might want to try some other solutions. More on this on the relevant issue on Github. Please let me know if any of you had a good experience with
sls because I haven't used it much yet.
Ok. Now let's write some code.
Setup AWS for Terraform
First, make sure that you've installed and authorized on your AWS CLI. Installing AWS CLI is kind of out of scope here, so please follow the guide on AWS.
After successful installation, run:
You will be prompted to enter Access Key ID and Secret Access Key. Depending on your situation, there are different ways of how you can handle this, but for the sake of simplicity we can make one user (from AWS console. You will probably only use it for 'Programmatic access') that would have these policies for applying Terraform codes.
This one for setting S3 bucket as a backend:
And this one for locking the state:
And the next one is quite tricky; because we will temporarily enable permissions related to managing IAM because we will first need to make a role from which we could
assumeRole whenever we try to plan and apply our IaC.
For now we can just go onto AWS console and make this policy:
Make sure you will need to narrow down to specific actions and resources used after everything is done.
Now, now that you've made three distinct policies (or all in one, depending on your preferences), attach them to the user that you've just crated for running
If you haven't already, install terraform by following an instruction from the official website. Just download the binary and move it to the
Now verify version of terraform
And then make
main.tf file in your project directory (I personally put it into
IaC folder because there will another folder for the 'real'
.ts codes for the backend):
Then, we will need to add s3 backend and state locking. But before then, make a table on Dynamodb and also a bucket on S3, each for hosting IaC backend and locking the state.
Now we will need to add more to the policy on DynamoDB we created because we want to create a table:
Then you could write this code (by the way, it may be a good idea to put this below IaC in a different general-purpose repository because the current repository is meant to be only used for lambda-related resouces. But for the sake of this article I will just write it away here):
And then, also add S3 backend (you will need to add relevant IAM policies too here, but since we know how to do it, I will cut the explanation):
terraform apply, verify the changes, and enter
yes. DynamoDB table and S3 Bucket should have been created. Here's the code so far:
Now, add s3 backend and state lock:
We are also going to use Docker provider, so add that too:
Now because you've added a backend and another provider, we will need to run
terraform init again, and then
terraform apply. Run it.
Setting up lambda
Now we will need to develop lambda on the local machine. Install SAM CLI:
Note that the outdated versions would not support running Docker containers, so make sure that your version is the latest.
Now, we won't run
sam --init, because it will make it difficult to make the server into a monorepo structure. The reason that we will want to make it into a monorepo is that it will make it much easier to propery dockerize every single lambda and deploy it with dependencies that each of them only require to have. Instead we will use lerna to initialize the server folder.
And as usual:
Then it will give you this layout:
Then, add your first function package. For the sake of this example, let's assume that we want to make a REST API, composed of many lambdas, each returning 'hello' as a response in different languages (which is totally useless in reality, but at least useful here). Our first lambda will be an English one.
Now the directory structure will look like this:
server, we will need to add some utils to build and invoke the function locally. Add modify the
server/package.json as follows and of course, run
npm i again:
To add some explanatioon to what we are trying to do: these
devDependencies are going to be package-wide dependencies. These are not specific to any one of the functions that we are going to build; They will help in tooling general stuffs. That's what we put them here.
@types/node: we will need this to give proper type definitions for 'built-in node' modules like
concurrently: just a script runner.
lerna: you know it.
nodemon: this will help us watch a directory and build Docker image again.
api: we will need these to launch our lambda function locally and invoke it.
Now, you need to create
template.yml for SAM cli to consume and run what we want to run.
We won't be able to run
sam build or
sam local start-api yet, because we still need to setup
Dockerfile and ECR repository.
So far we have added
template.yml for running SAM CLI:
Now we will add
This will be the content for Dockerfile:
To go through it line by line:
amazon/aws-lambda-nodejs:14is the official amazon image for lambda. Current LTS of nodejs is 14, so we are using this.
AS builderis related to multi-stage builds in Docker; it helps reduce the final Docker image size. Basically in this
builderstage, we will only build the output to be included in the final image, and any dependencies installed in this step won't be included in the final output image.
WORKDIR /usr/app: inside the docker image, set the working directory as
/usr/app. There isn't any
appfolder in a normal docker image, so it will make
appdirectory. We will put the compiled js code there.
npm install: it will install dependencies.
npm run build: it will compile typescript code into js.
RUN ls -la # for debugging: it is merely for debugging. While building, docker will output what's inside there at that time, for you to verify if you are doing what you intended to do.
FROM amazon/aws-lambda-nodejs:14: this is the second build stage in Docker. All outputs from the previous stage will be discarded in this stage unless explicitly specified to be included.
RUN npm install --only=prod: it will only install
COPY --from=builder /usr/app/lib /usr/app/lib: it explicitly refers to the previous
builderstage to copy whatever that was inside
/usr/app/libto the current
CMD [ "/usr/app/lib/index.handler" ]: the command should be
path-to-lambda-handler-without-extension.handler. That's just how it works.
Now we've added a Dockerfile. Now let's setup basic environment for lambda:
You will need to modify
Promise API. I recommend turning other options too, especially those related to strict-type checking:
Now, add a really simple lambda:
So far, we have created these: https://gist.github.com/41c610b04c74f88926c13469ca50224f
server/ to watch and build files:
nodemon.json, you can start running
npm run watch or
npm start. It would do two things: build Dockerfile as you make any changes under
packages/ directory, and host a local endpoint for the lambda. It will be similar to hot-reload although it seems more like a hack; you won't need to cancel and run
sam local start-api again once you make a change. If it does not work, try again after creating ECR first.
Oh, and you can delete
lib/hello.js because we are not using them. Anyways, now we are kind of ready to build this function into a docker image. Let's try it:
Everything's cool, docker build succeeded. You can try running the image and test the request:
This is where SAM CLI should start to come in. But before then, we will need to make a ECR repository with terraform. Let's go back to terraform for a while.
Back to terraform: assume role and ECR
Now, we will need to create a role first because we will relay on that role to get required permissions to create whatever resource we want to. This is called 'assuming a role', and the reason why it's deemed to be a good practice is that you won't have to create multiple credentials (probably multiple users) to do certain thing that requires permissions. Instead, you borrow the permission for the period of time when you plan and apply the changes in the resources.
So how do we do it? First, let's create
For the sake of this article, we won't be diving deep into specific policies, so we will just allow almost all resources without specifying them in detail. For real-world usage, you will have to define exact statements giving just the right permissions.
What we are doing here, essentially, is that we are allowing
localtf user to assume the role of
hello_role that possesses all policies to run the hello server stack. This is called 'creating a trust relationship' (you will see this if you do this process on AWS). This way,
localtf won't always have to hold all permissions it needs. It only acquires them only when needed (i.e. deploying)
Once you are done writing
terraform apply to make changes.
Now, go back to
main.tf and add:
Once you add
assume_role, now you can create any resources you want, using the permissions given by that role. Let's now make an ECR repository. Make
apply and terraform will soon make ECR.
Building and pushing the image to ECR
Now that we've made an ECR, we can go back to our server and write a little script to login, build and push the image of the lambda we are writing.
It's pretty much straightforward; just get your AWS cli ready for using; and authenticate to be able to use ECR from Docker CLI:
Next, tag your Docker image as
latest and separate timestamnp at the time of the build, so that latest tag will always be the latest built image, and then another image will remain for the record, which you can use later to revert back or do other things in some cases.
Now, you can test it out yourself as such:
After this, you will be able to see on AWS ECR:
as you can see, the images are going to be tagged by timestamp, and the latest built image will be always tagged as
latest, and you are going to reference this tag in Terraform to apply newly built Docker image to lambda.
So far, we've made changes like so:
Creating Terraform resources for lambda and API gateway
Now the majority of the prepartion is done, so we can move onto creating actual lambda and API gateway.
First, the most important part: you want to create lambda itself from Docker.
There is already a module just made for this, so use it to make lambda. Once you are done,
apply the changes.
The key here is that you are going to reference an image URI from ECR where the tag is the latest. If you have previously built and pushed a new docker image, the hash of the image is going to be different, thus causing a redeployment of the lambda function. Otherwise it has no idea if the docker image is new or not.
Again, after running the change, you can see the lambda on AWS console:
as you can see, compared to other lambdas, it has the package type of 'Image', which means it's not from a Zip, but a Docker image.
You should be able to see the image URI (including the hash) of the image at the bottom of the information about lambda. If you click on that image URI, you will be navigated to the latest image on ECR that you just built and pushed.
Now, you will be able to test lambda on AWS Lambda console, but what we want in the end is something like sending GET
/hello to a certain domain and receiving a response. To be able to do that, we need to setup API Gateway.
For this example, we will setup a domain at
I will explain the code one by one.
aws_api_gateway_rest_api will create a REST api. It usually does not include a single endpoint; it usually contains multiple, like:
api.hello.co/nihao, and so on. On AWS, it is equivalent to a single row in APIs tab:
aws_api_gateway_resource: to put it very simply, you can think of this as a single API endpoint that has not yet been deployed. In this case we create a single endpoint ending with
In the below example, we have created two different resources with
POST, to the same path (hidden by black overlay). We will talk about creating OPTIONS resource to handle preflight requests later. For now, it would suffice to know that creating a REST resource means creating a certain endpoint.
aws_api_gateway_method: Now, you want to create a REST method for that resource. Our
/hello endpoint will just require no auth (out of scope of this article), and be a GET method.
aws_lambda_permission: By default, there's no permission for API gateway to invoke lambda function. So we are just granting a permission to it so that it can be executed.
aws_api_gateway_integration: API gateway supports transforming (filtering, preprocessing, ...) a request before it reaches client or a response from the client before it reaches the actual lambda. We are not doing any special things here for this example, but you may want to use it in the future. For more, read relevant AWS docs.
aws_api_gateway_stage: API gateway supports separating API into different stages out of the box. You should use this to separate your API across production, staging and development environments. For now, we will only make a stage for current terraform workspace, which is assumed to be
dev across all examples in this article. Once you apply your changes, you are going to be able to see this on your AWS console:
aws_api_gateway_deployment: This is equivalent to clicking on 'Deploy API' on AWS console.
Once resources are created in API gateway, they must be deployed in order to be reachable from external clients. One little problematic thing is
redeployment; Even if you make a change to your REST API resources, it will not get deployed if
redeployment argument does not change. There are mainly two ways of getting around this:
timestamp()to trigger redeployment for every single
apply. Using this approach, lambda may be down for few seconds while redeployment. But it is for sure that it always deploys, so I would just go with this one if my service does not handle many users.
md5(file("api_gateway.tf"))to trigger redeployment whenever this file changes. But you need to always make sure that EVERYTHING related to API Gateway deployment only stays inside this file.
Ok. So far we have set up lambda and basic API Gateway configurations. Right now, you can test your API on Postman like this: first, go to AWS API Gateway console, and find a specific endpoint that is deployed to a certain stage. There should be an 'Invoke URL' at the top of the page.
Now, open up Postman, and
- Insert your invoke URL
- Click 'Authorization', and choose the type 'AWS Signature', and enter
SecretKey. These keys should be coming from one of user's credentials from AWS IAM Console. If you do not have one specialized for calling an API set up with lambda and API gateway from local environment, make one user for that and get the keys.
- Insert your AWS Region.
- If your API requests any more query parameters or body, insert them.
- Click send, then it should work.
If you do not provide AWS Credentials in your request header, it won't work, because so far your API can be only used by IAM users known to the API gateway, and if you are just sending a request from your local computer without providing any access and secret keys, it won't know it's you. To make it work even without providing credentials for any public APIs intended to be exposed to client applications, you should now configure custom domain.
So far we have made changes to make lambda and API Gateway resources. The list of files we should have by now is the following.
Next, we will see how to create a custom domain and relate that domain to the REST API we just made.
Creating Terraform resources for Custom domain
Now, the problem is that we have the API, but it's not callable from any external client applications, which is a common case for many projects. So we want to register a domain first to represent our endpoints.
Before making a change for custom domain, we need to setup another AWS provider because we will need to use
us-east-1 region for Edge-optimized custom domain name (that's the only region that supports creating an ACM certificate for Edge-optimized custom domain name).
There are two choices available for an API endpoint: 1. edge; 2. regional. If your endpoint should be accessed by worldwide clients, use edge; if your endpoint is specifically confined to be used in one specific region in the world, use regional. If you don't know what to do, it totally safe to go with edge for now. First, add another aws provider:
Then, you write the actual code to make a certificate and custom domain.
Make sure you get the existing hosted zone ID (or any other zone ID that you intend to use) from AWS Route53 Console:
Use that ID to create Route53 Record for the custom domain name.
After you apply your change, you will be able to see ACM certificate being created on AWS Certificate Manager:
Just make sure you verified the status to be 'Issued' and the validation status to be 'success'. You may need to wait for several minutes before this completes. Also, if your certificate is not showing up, make sure that you are on
us-east-1, not anywhere else.
After you are done with this, now you can go back to API Gateway again, and configure custom domain. Now that you've registered a domain, you can see it right up from API Gateway console:
In the certificate dropdown, you should be able to see the domain that you have just created. Do not make create domain name there on the console. Now come back to terraform and let's write the equivalent code for that.
You are just going to fill out the options that you just saw from the API Gateway console. Just fill out the relevant info about certificate, domain name, and endpoint config.
Now, this is important: you need to create another Route53 Record to map your custom domain to cloudfront. Once you create your custom domain, AWS creates 'API Gateway domain name', circled with red in the below picture:
You need to route the traffic to
api.hello.com to this API Gateway domain name (an example of an API Gateway domain name would be
asdfasdfasdf.cloudfront.net as long as you are using
EDGE). That's what we are doing with
aws_route53_record.custom_domain_to_cloudfront. Otherwise, the response to your API will keep showing some weird errors that are really hard to guess the causes of. I found AWS really lacking a documentation on this part, so please be advised on this one. You need to create another Route53 Record.
You will be able to verify by entering Route53 console and looking for
api.hello.com. It should appear as the following:
Aftrer that, you don't have to do many things; just add base path mapping resource, in
api_gateway.tf. Even if you do not have an additional trailing path to your endpoint, you must create a base path mapping. Otherwise your API won't be exposed to the public.
After applying this change, verify that your API mapping has been created:
Now, you can go back to Postman, and test your api by requesting GET
api.hello.com/hello. What may be confusing here is that you are not adding any
path in base path mapping. If you add
hello as a path, your API endpoint will be configured as
api.hello.com/hello/hello, which is obviously not what we want. So do not add any path mapping if you already have configured your path in
aws_api_gateway_resource). Anyways, request and response to the API endpoint should work as expected if everything has been setup correctly so far.
Enabling OPTIONS (preflight request)
Now, our client application, of course, is not Postman, so usually clients will request OPTIONS
api.hello.com/hello first, and then request GET
api.hello.com/hello, if they intend to send CORS requests, which is a very common case (read more about that from MDN docs)
If you have not done anything related to handling OPTIONS request, it's very likely that you will get some error in your client application, like this (I got this picture from somewhere else for the purpose of demonstration):
So let's do it! There's already a handy module written by a great dev, so we will just use that:
Note that if you have any custom headers, you must define it in your config. Next, verify that OPTIONS requests are now allowed on the console:
Also, you will need to change your lambda's response header too. Just adding
Access-Control-Allow-Origin": "*" will work:
Now, apply the changes, and go back to your client app and retry the request. It should be working.
If you have followed everything, you will have created these files:
So far, we have looked at how to setup, develop and deploy a dockerized lambda application with Typescript, Terraform and SAM CLI. There are tonnes of things to cover on lambda.. maybe next time, it will be on using resources inside VPC from lambda. I hope you enjoyed this and found some valuable insights. Thank you.