Skip to content

Instantly share code, notes, and snippets.

@guillaumesmo
Last active June 20, 2021 14:14
Show Gist options
  • Save guillaumesmo/4782e26500a3ac768888daab3c55b139 to your computer and use it in GitHub Desktop.
Save guillaumesmo/4782e26500a3ac768888daab3c55b139 to your computer and use it in GitHub Desktop.
CloudFormation Custom Task Definition POC
# Sources:
# https://cloudonaut.io/how-to-create-a-customized-cloudwatch-dashboard-with-cloudformation/
# https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/template-custom-resources.html
# https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/ECS.html
Resources:
CustomTaskDefinition:
Type: 'Custom::TaskDefinition'
Version: '1.0'
Properties:
ServiceToken: !GetAtt 'CustomResourceFunction.Arn'
TaskDefinition: |
{
containerDefinitions: [
{
name: "sleep",
image: "busybox",
command: [
"sleep",
"360"
],
mountPoints: [
{sourceVolume: "efs", containerPath: "/efs"}
]
}
],
family: "sleep360",
taskRoleArn: "", // required for EFS permissions
cpu: "256",
memory: "512",
networkMode: "awsvpc",
volumes: [
{
name: "efs",
efsVolumeConfiguration: {
fileSystemId: "" // required for EFS
}
}
]
}
CustomResourceFunction:
Type: 'AWS::Lambda::Function'
Properties:
Code:
ZipFile: |
const aws = require('aws-sdk')
const response = require('cfn-response')
const ecs = new aws.ECS({apiVersion: '2014-11-13'})
exports.handler = function(event, context) {
console.log(`AWS SDK Version: ${aws.VERSION}`)
console.log("REQUEST RECEIVED:\n" + JSON.stringify(event))
if (event.RequestType === 'Create' || event.RequestType === 'Update') {
ecs.registerTaskDefinition(eval(`(${event.ResourceProperties.TaskDefinition})`))
.promise()
.then(data => {
console.log(`Created/Updated task definition ${data.taskDefinition.taskDefinitionArn}`)
response.send(event, context, response.SUCCESS, {}, data.taskDefinition.taskDefinitionArn)
})
.catch(err => {
console.error(err);
response.send(event, context, response.FAILED)
})
} else if (event.RequestType === 'Delete') {
ecs.deregisterTaskDefinition({taskDefinition: event.PhysicalResourceId})
.promise()
.then(data => {
console.log(`Removed task definition ${event.PhysicalResourceId}`)
response.send(event, context, response.SUCCESS)
})
.catch(err => {
if (err.code === 'InvalidParameterException') {
console.log(`Task definition: ${event.PhysicalResourceId} does not exist. Skipping deletion.`)
response.send(event, context, response.SUCCESS)
} else {
console.error(err)
response.send(event, context, response.FAILED)
}
})
} else {
console.error(`Unsupported request type: ${event.RequestType}`)
response.send(event, context, response.FAILED)
}
}
Handler: 'index.handler'
MemorySize: 128
Role: !GetAtt 'CustomResourceRole.Arn'
Runtime: 'nodejs12.x'
Timeout: 30
CustomResourceRole:
Type: 'AWS::IAM::Role'
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: 'lambda.amazonaws.com'
Action: 'sts:AssumeRole'
Policies:
- PolicyName: 'customresource'
PolicyDocument:
Statement:
- Effect: Allow
Action:
- 'ecs:DeregisterTaskDefinition'
- 'ecs:RegisterTaskDefinition'
Resource: '*'
- Effect: Allow
Action:
- 'logs:CreateLogGroup'
- 'logs:CreateLogStream'
- 'logs:PutLogEvents'
Resource: '*'
- Effect: Allow
Action:
- 'iam:PassRole'
Resource: '*' # replace with value of taskRoleArn
@mapoulos
Copy link

@jaska120 I imagine you need to change the file/directory permissions so that the nginx user can read them. More info here.

Probably executing something like

chown -R nginx:nginx /usr/share/nginx/ 

in the entrypoint (or in the docker build if you're building a subimage) will do the trick (may also need to do a chmod, see the link above).

@nickaustin13
Copy link

nickaustin13 commented Jun 17, 2020

This deploys fine when using busybox or any other image from dockerhub, but fails with this error when using an image hosted in ECR:
Task definition does not support launch_type FARGATE. (Service: AmazonECS; Status Code: 400; Error Code: InvalidParameterException;)
Any ideas?

Update:
To make it work with private ECR images you need to add these 2 properties to the custom task definition:
executionRoleArn: { "Ref" : "TaskExecutionRoleArnParameter" }
requiresCompatibilities: [
"FARGATE"
]

Your TaskExecutionRoleArnParameter that you pass in as a parameter should have the permissions explained here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html

@jaska120
Copy link

jaska120 commented Jun 18, 2020

@nickaustin13 I have just tested with ECR and it works. You have some unsupported parameter defined in your Task definition for FARGATE launch type. See my working TaskRole below:

    TaskRole:
      Type: AWS::IAM::Role
      Properties:
        AssumeRolePolicyDocument:
          Statement:
            - Effect: Allow
              Principal:
                Service: ecs-tasks.amazonaws.com
              Action: "sts:AssumeRole"

EDIT: I read your comment again and saw it was updated

@jaska120
Copy link

jaska120 commented Jun 18, 2020

@mapoulos Thanks for your suggestion. I am new to docker and still trying to figure this out. Permissions are ok while using local docker but not while deploying to Fargate. I did try to build subimage with simple Dockerfile:

FROM nginx:latest

RUN chown -R nginx:nginx /usr/share/nginx
RUN chmod 777 /usr/share/nginx

I tried that with 3 combinations: only chown, only chmod and both of them. I deployed my image to ECR but every time I start the Task I get permission errors. (And I didn't forget to change the version while trying different builds..). This must be something to do with how EFS mounts with docker and what kind of permissions are granted. I even have arn:aws:iam::aws:policy/AmazonElasticFileSystemClientFullAccess managed policy attached to my TaskRole.

No luck so far.. Funny part is that according to this AWS guide https://docs.aws.amazon.com/AmazonECS/latest/developerguide/tutorial-efs-volumes.html everything works even without any custom entrypoint or subimage but things are not going well while I am trying the same.

My ultimate goal is to use odoo image, but since nginx is more common I am trying with it first (same permission problems with odoo image).

Any idea?

@mapoulos
Copy link

@jaska120
That is frustrating. The only other thing I can think is IAM permissions: are you using IAM auth at all? I imagine not, but that would potentially cause permission errors.

Only other thing I can think to try is creating an EC2 instance and mounting the EFS system there, seeing what the perms are, etc. My cloudformation looks like this, if it helps:

  SearchallFileSystem:
    Type: AWS::EFS::FileSystem
    Properties:
      Encrypted: true
      PerformanceMode: generalPurpose
      ThroughputMode: bursting
  SearchallEFSMountTarget:
    Type: AWS::EFS::MountTarget
    Properties:
      FileSystemId: !Ref SearchallFileSystem
      SecurityGroups:
        - !ImportValue SearchallSecurityGroup
      SubnetId: !ImportValue SearchallPublicSubnet
  CustomTaskDefinition:
    Type: 'Custom::TaskDefinition'
    Version: '1.0'
    Properties: 
      ServiceToken: !GetAtt 'CustomResourceFunction.Arn'
      TaskDefinition: {
        containerDefinitions: [
          {
            name: "sonic",
            image: {"Ref" : "Image"},
            logConfiguration: {
              logDriver: "awslogs",
              options: {
                "awslogs-group": {"Ref" : "SearchallLogGroup" },
                "awslogs-region": {"Fn::Sub" :  "${AWS::Region}"}, 
                "awslogs-stream-prefix": "searchall-ecs"
              },
            },
            portMappings: [{"containerPort" : {"Ref" : "Port"}}],
            mountPoints: [
              {sourceVolume: "sonic-efs", containerPath: "/var/lib/sonic/store/"}
            ]
          }
        ], 
        family: "searchall-ecs",
        cpu: "256",
        memory: "512",
        networkMode: "awsvpc",
        executionRoleArn: {"Fn::GetAtt" : "SearchallExecutionRole.Arn"},
        requiresCompatibilities: ["FARGATE"],
        
        volumes: [
          {
            name: "sonic-efs",
            efsVolumeConfiguration: {
              fileSystemId: {"Ref" : "SearchallFileSystem"} # required for EFS
            }
          }
        ]
      }

@jaska120
Copy link

@mapoulos Thanks for sharing your cf file. I am not using IAM access since lambda JS aws-sdk layer doesn't support it yet co'z of too old version of sdk on the layer. The only difference I can see is that you haven't provided taskRoleArn and your FileSystem is encrypted. Trying those now.

Do you mind to share your SearchallSecurityGroup just in case I have misconfigured my security group?

@JoanBelder
Copy link

@ericklau I ran into the same problem. But I didn't want to have the hassle of extra layers. I figured that the python runtime in aws lambda has a more up-to-date version of the aws sdk, so I simply ported the code to python, which worked for me. (I haven't actually run all possibilities yet though, so it could also be very buggy)

CustomResourceFunction:
    Type: 'AWS::Lambda::Function'
    Properties:
      Code:
        ZipFile: |
          import json
          import logging
          import boto3
          import cfnresponse

          logger = logging.getLogger()
          logger.setLevel(logging.INFO)
          ecs = boto3.client('ecs')


          def handler(event, context):
              logger.info('got event {}'.format(event))
              if event['RequestType'] == 'Create' or event['RequestType'] == 'Update':
                  try:
                      data = ecs.register_task_definition(**json.loads(event['ResourceProperties']['TaskDefinition']))
                      logger.info(f"Created/Updated task definition ${data['taskDefinition']['taskDefinitionArn']}")
                      cfnresponse.send(event, context, cfnresponse.SUCCESS, {}, data['taskDefinition']['taskDefinitionArn'])
                  except BaseException as error:
                      logger.error(error)
                      cfnresponse.send(event, context, cfnresponse.FAILED, {})
              elif event['RequestType'] == 'Delete':
                  try:
                      ecs.deregister_task_definition(taskDefinition=event['PhysicalResourceId'])
                      logger.info(f"Removed task definition ${event['PhysicalResourceId']}")
                      cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
                  except ecs.exceptions.InvalidParameterException:
                      logger.info(f"Task definition: ${event['PhysicalResourceId']} does not exist. Skipping deletion.")
                      cfnresponse.send(event, context, cfnresponse.SUCCESS, {})
                  except BaseException as error:
                      logger.error(error)
                      cfnresponse.send(event, context, cfnresponse.FAILED, {})
              else:
                  logger.error(f"Unsupported request type: ${event['RequestType']}")
                  cfnresponse.send(event, context, cfnresponse.FAILED, {})
      Handler: 'index.handler'
      MemorySize: 128
      Role: !GetAtt 'CustomResourceRole.Arn'
      Runtime: 'python3.7' # python3.8 does not support ZipFile :(
      Timeout: 30

@mapoulos
Copy link

@jaska120

Sure thing. Here are the bits that should be relevant (I'm not being as careful with the Egress as I might be):

  SearchallSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: A security group for the lambdas, the ecs cluster (for sonic), and the private endpoints
      VpcId: !Ref SearchallVPC
      Tags:
        - Key: project
          Value: searchall-prod
        - Key: type
          Value: searchall-network
  SearchallSecurityGroupEgress:
    Type: AWS::EC2::SecurityGroupEgress
    Properties:
      GroupId: !Ref SearchallSecurityGroup
      IpProtocol: tcp
      FromPort: 443
      ToPort: 443
      CidrIp: 0.0.0.0/0
      Description: HTTPS for ECS/ECR
  SearchallSecurityGroupEgressDynamo:
    Type: AWS::EC2::SecurityGroupEgress
    Properties:
      GroupId: !Ref SearchallSecurityGroup
      IpProtocol: tcp
      FromPort: 0
      ToPort: 65535
      DestinationSecurityGroupId: !Ref SearchallSecurityGroup
      Description: Allow lambdas to get to dynamo through the endpoint    
  SearchallSecurityGroupIngress:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      GroupId: !Ref SearchallSecurityGroup
      SourceSecurityGroupId: !Ref SearchallSecurityGroup
      IpProtocol: tcp
      FromPort: 1491
      ToPort: 1491
  SearchallSecurityGroupEFS:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      GroupId: !Ref SearchallSecurityGroup
      SourceSecurityGroupId: !Ref SearchallSecurityGroup
      IpProtocol: tcp
      FromPort: 2049
      ToPort: 2049

@guillaumesmo
Copy link
Author

For those having issues with permissions, doing a chmod from the Dockerfile will not help since those commands are run when building the image, not when running in ECS

the best thing you can do is mount the EFS in a temporary EC2 instance, create the folder and chmod it accordingly from there and your task should run fine afterwards.

@mapoulos
Copy link

@guillaumesmo Can't believe I missed that. You're right of course.

Another option is to run the chmod and chown in the entrypoint of the image, but that would add startup time (and be superfluous after the first time).

@jaska120
Copy link

jaska120 commented Jul 1, 2020

I forgot to reply and thank you guys!

@mapoulos I had almost the same setup but anyway go to do a double check that everything was ok on the cf side.

@guillaumesmo Thank you for your solution - worked like a charm.

So.. those of you having permission problems, keep in mind that doing chmod while creating your container won't work, since the mount folder is available only while running container. That's why you should use temporary Bastion host and mount EFS there when doing your first deployment.

Executing sudo chmod -R 777 /mnt/efs from Bastion worked where /mnt/efs is the folder where EFS was mounted in the first place.

@namedgraph
Copy link

I'm trying to follow this... I'm getting this error when using EFS volume for my container:

Error response from daemon: create ecs-LinkedDataHubStackLDHTaskDefinitionF106B511-162-FusekiAdminDataVolume-e69dae89abd09e9de901:
VolumeDriver.Create: mounting volume failed: b'mount.nfs4: mounting fs-468514f2.efs.us-east-1.amazonaws.com:/var/fuseki/data/admin failed, reason given by server:
No such file or directory'

What could be the issue here?

@jedis00
Copy link

jedis00 commented Jun 20, 2021

I'm trying to follow this... I'm getting this error when using EFS volume for my container:

Error response from daemon: create ecs-LinkedDataHubStackLDHTaskDefinitionF106B511-162-FusekiAdminDataVolume-e69dae89abd09e9de901:
VolumeDriver.Create: mounting volume failed: b'mount.nfs4: mounting fs-468514f2.efs.us-east-1.amazonaws.com:/var/fuseki/data/admin failed, reason given by server:
No such file or directory'

What could be the issue here?

Make sure your /var/fuseki/data/admin exists. Also, I don’t think this is needed anymore as the support was added natively awhile back iirc.

@namedgraph
Copy link

@jedis00 exists where -- in EFS or in the container? If EFS, how do I create it there?
P.S. Yes I'm using native support.

@jedis00
Copy link

jedis00 commented Jun 20, 2021

You are telling it what directory to mount the EFS to inside of the container. Your container pipeline should be running a ‘mkdir -p /var/fuseki/data/admin‘ to create it if it doesn’t already exist.

@namedgraph
Copy link

OK. This is not required with host mounts though -- so the EFS volumes are different in this respect?

@jedis00
Copy link

jedis00 commented Jun 20, 2021

OK. This is not required with host mounts though -- so the EFS volumes are different in this respect?

Yes it is required for mounting an EFS volume to a host. You’re telling it what directory to mount the EFS to on the host. Since the idea of this is to not mount to the host, you’re mounting it directly inside of the container.

@namedgraph
Copy link

Doesn't the fs-468514f2.efs.us-east-1.amazonaws.com:/var/fuseki/data/admin syntax refer to EFS host:path? Meaning the missing directory is within EFS?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment