Skip to content

Instantly share code, notes, and snippets.

@Johnlon
Last active May 9, 2024 16:06
Show Gist options
  • Save Johnlon/98e1e4b9aaedfedead428c89e7dab5bb to your computer and use it in GitHub Desktop.
Save Johnlon/98e1e4b9aaedfedead428c89e7dab5bb to your computer and use it in GitHub Desktop.
Work in progress write up on adding an extension to cloud custodian

John's Unofficial Dummys' Guide to Writing a Cloud Custodian Extension


This doco is mostly intended to remind the author how this stuff works, but has now been roaded tested by someone else too.

Before you start you'll need an IDE - I use Intellij with the Python plugin. You'll need to have gone through these

Artefacts we'll create to deliver a working Cloud Custodian extension

When writing an extension from scratch then you need to ...

  • create an implementation of the extension
  • write tests
  • register your extension with Cloud Custodian

For the rest of this discussion we'll flesh out the code around a real example; creating an extension for the VoiceConnector resource of the ChimeSDKVoice service. At some point this resource may really exist as a service in CC but please don't get confused between this discussion here and any future real implementation; to be clear, at the time of writing there is no VoiceConnector resource in Cloud Custodian.

The implementation class for a CC extension that targets ChimeSDKVoice would best be placed in a file called c7n/resources/chimesdkvoice.py and the tests for the extension will be placed in tests/test_chimesdkvoice.py.

Registering the extension implementation class with CC requires one to annotate the implementation class with an Identify that we need to choose, and then add a record of that ID to c7n/resources/resource_map.py, where that record maps the chosen ID back to your implementation class.

The implementation class for our VoiceConnector impl will primarily contain some configuration that tells Cloud Custodian how to connect to the AWS API for ChimeSDKVoice using the Boto3 client SDK. Using the SDK Cloud Custodian can discover instance of resources and find out their detailed descriptions. In addition, there is sometimes the need to override features of CC to adapt CC to operate properly with your specific resource type and any such overrides will live inside your impl class file (should the need arise).

Gathering the API and data model details needed to build the extension

Since we need to configure the comms between CC and AWS we need to discover and familiarise ourselves with the relevant AWS APIs and data models. These API's are provided by the "Boto3" library and the API's for each AWS service are grouped into separate Boto clients and each client has a well-known name documented in the Boto3 pages.

Three main tings we need to discover are ..

  • Boto Service name, aka Client name: The name of the Boto3 client that will be used to communicate with the API's of the resource you are building an extension for
  • Enumerate API: The Boto API that allows CC to enumerate (ie list) all the resources. We need to know the names of the parameters to this API call and also the shape of the response from the API.
  • Detail API : The Boto API that provides a detailed description of each resource. Again we need to know about the shape of the request and responses.

Optional ..

  • Cloud Trail Event: If you are planning to make your extension and event based detection then you will also need to know the details of the specific AWS cloud trail events that you are going to use as triggers. But we'll come back to events much later.

Within Cloud Custodian the API's above are used as follows ...

  • The "enumerate" API (also commonly known as "list" API) is generally required by the Cloud Custodian "pull" mode of operation. It is needed in order to discover the Id's of the VoiceConnector instances. In "event" mode the Id's of the resources will be looked up within the relevant cloud trail event.
  • The "detail" API (sometimes know as the "describe" or "get" API) is needed by both the "pull" and "event" modes of operation and allow CC to obtain the full details of each VoiceConnector resource so that filters and actions etc can be applied to that resource.

The Boto3 documentation provides content that covers all the various API and model information requirements just outlined above.

  • Boto Service Name

The Boto 3 "services" page lists all the supported services and if we scroll down to the ChimeSDKVoice service entry then we find a link
'Client' that leads to a page providing details of the ChimeSDKVoice APIs.

On the ChimeSDKVoice client page then we can see a snippet that shows us that the Boto3 API client name is " chime-sdk-voice".

import boto3
client = boto3.client('chime-sdk-voice')

This is the first value we need to make a note of as we'll use it to configure the extension a bit later.

  • Enumerate (or List) API

Generally, the "enumerate" API will be called something like "list_xxxxx". If we look at the ChimeSDKVoice client page, then we can see there is an API called list_voice_connectors that lists the resources we are looking for. The AWS "list" API's will generally not take parameters, or at least any parameters are generally restricted to how to do paging of results.

The documentation shows us the shape of the response from the list_voice_connector call...

Response Syntax:
{
    'VoiceConnectors': [
        {
            'VoiceConnectorId': 'string',
            'AwsRegion': 'us-east-1'|'us-west-2'|'ca-central-1'|'eu-central-1'|'eu-west-1'|'eu-west-2'|'ap-northeast-2'|'ap-northeast-1'|'ap-southeast-1'|'ap-southeast-2',
            'Name': 'string',
            'OutboundHostName': 'string',
            'RequireEncryption': True|False,
            'CreatedTimestamp': datetime(2015, 1, 1),
            'UpdatedTimestamp': datetime(2015, 1, 1),
            'VoiceConnectorArn': 'string'
        },
    ],
    'NextToken': 'string'
}

The response syntax details indicate that the list of resources is contained within a top level field called VoiceConnectors which contains an array. We can see that each element of the array contains some fields like ` VoiceConnectorId", "Name", "VoiceConnectorArn" and "CreatedTimestamp". The names of these specific fields will be needed later, so take note.

The information we've gathered here will be used to populate the CC configuration item called enum_spec that we'll encounter in a bit more detail shortly.

  • Detail API

Generally, the "detail" API's will be named as "describe_xxxxx" or "get_xxxxx". In this instance the client page points us towards get_voice_connector as the API to describe an individual voice connector resource in detail.

If we examine the "get" API then we see that a call to this API requires us to pass the Id of the voice connector in an parameter named "VoiceConnectorId". We need to note the name of this API and this paramter name for later.

Lets take alook at the shape of the response...

Response Syntax:
{
    'VoiceConnector': {
        'VoiceConnectorId': 'string',
        'AwsRegion': 'us-east-1'|'us-west-2'|'ca-central-1'|'eu-central-1'|'eu-west-1'|'eu-west-2'|'ap-northeast-2'|'ap-northeast-1'|'ap-southeast-1'|'ap-southeast-2',
        'Name': 'string',
        'OutboundHostName': 'string',
        'RequireEncryption': True|False,
        'CreatedTimestamp': datetime(2015, 1, 1),
        'UpdatedTimestamp': datetime(2015, 1, 1),
        'VoiceConnectorArn': 'string'
    }
}

Perhaps you'll notice that the shape of the "get_voice_connector" and "list_voice_connector" responses are identical. This is sometimes the case with the AWS API's but it's not a rule by any means. In any case, CC expects us to define both these API's in the configuration of any extension, and both API's will be used during execution of CC policies.

The information we've gathered here will be used to populate the CC configuration item called detail_spec, more on that in a moment.

We will be referring back to the information above frequently in the following discussion.

Creating the implementation configuration

Generally, you will start out by extending the base class QueryResourceManager which can be found in c7n/query.py.

Within your query manager class you need to define a nested class called resource_type that itself extends a framework class called TypeInfo.

The resource_type nested class provides the necessary CC config for your extension and it's here we'll spent most of our time figuring out the impl.

So, we'll be looking at the implementation file called c7n/resources/chimesdkvoice.py and we'll start with something like this in that file...

class VoiceConnector(QueryResourceManager):

    # interior class that defines the aws metadata for resource
    class resource_type(TypeInfo):
        ...

Our resource_type class has a super class called Typeinfo that is defined in c7n/query.py and our first job is to override a load of static fields that are defined in the superclass.

Pause ...: Being an old C++ and Java/Scala programmer I find the idea of overriding a static field a disturbing. My approach would have been to design this framework so that the type info was defined as a field on the custom QueryResourceManager class and not a bunch of statics on a nested class, but hey-ho. I think the best idea is not to worry about this strange construct too much and instead think of this pattern as if the TypeInfo super class is like a prototype for our custom definition. By the way, some supporting evidence that this pattern is a bit weird is that my IDE Intellij warns me about it, but at least Intellij shows a navigation from my own class to the these static fields in TypeInfo, but when I open the same code in VSCode it doesn't give any recognition at all that these parent class static fields exist of that I might be "overriding" them.

Anyway, if you inspect the TypeInfo class then you will see a ton of static fields, but you need only a fraction of these for your impl. I added a ton of doco to the TypeInfo class when I was adding my own AWS Appmesh extension, but only for those aspects that AWS Appmesh relied on.

My recommendation is to start with this lot ...

@resources.register('chimesdkvoice-voiceconnector')
class VoiceConnector(QueryResourceManager):

    # interior class that defines the aws metadata for resource
    class resource_type(TypeInfo):
        service     = None

        enum_spec   = None
        detail_spec = None

        id   = None
        name = None
        arn  = None
        date = None

        cfn_type    = None
        config_type = None

        universal_taggable = object()

Registering your extension class

Before we get into what values to provide for those fields there is one more important feature of your QueryResourceManager impl and that is registration. In the snippet above there is the @resources.register annotation which contains a string that will be the Id of your resource extension, in this case I'm choosing the Id chimesdkvoice-voiceconnector. The Id value that you define here will need to be added to the resource_map.py to complete your registration. This id will also appear in any CC policies that are defined using your CC extension, so choose wisely.

If we open resource_map.py we'll see that it contains a dict that has mappings from a name like "aws.*" to a class name. So since our extension has the id chimesdkvoice-voiceconnector and is defined in a file called c7n/resources/chimesdkvoice.py then the resource mapping record will be from aws.chimesdkvoice-voiceconnector to c7n.resources.chimesdkvoice.VoiceConnector. Once that has been done then your extension registration is complete.

Configuring the resource_type class.

  • service

In the data gathering above we've already identified that the Boto3 client name is chime-sdk-voice so that's the value we need to give to service.

@resources.register('chimesdkvoice-voiceconnector')
class VoiceConnector(QueryResourceManager):

    class resource_type(TypeInfo):
        service = "chime-sdk-voice"
        ...
  • enum_spec

The enum_spec configures CC's call to the relevant "list" API and in the data gathering above we've already identified the API call as list_voice_connectors and we know that the field in the response that contains all the resources is called VoiceConnectors. So, we can now fill in the details of the enum_spec...

@resources.register('chimesdkvoice-voiceconnector')
class VoiceConnector(QueryResourceManager):

    class resource_type(TypeInfo):
        service = "chime-sdk-voice"

        # fields are = [ "api name", "parameter name", not used ]
        enum_spec = [ "list_voice_connectors", "VoiceConnectors", None ]

        ...

The third element in the enum_spec isn't relevant to us right now and it's purpose is to further tune the call to the SDK. We aren't using it so we're passing a None as the value.

  • detail_spec

We know that the detail function is called "get_voice_connector" and we know that the parameter to this call is named " VoiceConnectorId".

We also know where to get the value of the VoiceConnectorId from because we saw that within the "list_voice_connectors" response there is a field in the data called "VoiceConnectorId".

With this information we can extend our implementation.

@resources.register('chimesdkvoice-voiceconnector')
class VoiceConnector(QueryResourceManager):

    class resource_type(TypeInfo):
        service = "chime-sdk-voice"

        enum_spec = [ "list_voice_connectors", "VoiceConnectors", None ]

        # fields are = [ "api name", "parameter name", "name of the identity field from the list repsonse", not used]
        detail_spec = [ "get_voice_connector", "VoiceConnectorId", "VoiceConnectorId", None]

The next few things that need configuration are:

  • id

The "id" config item defines the name of the field in the "enum_spec" response that contains the formal ID of the resource.

If we refer back to the shape of the response to the "list_voice_connectors" call then we'll recall the that ID of each resource is defined by a field called `VoiceConnectorId"

  • name

Likewise we saw that the "name" of the voice connector resource can be obtained from the "Name" field contained in the " list_voice_connectors" response.

  • arn

We saw above that the ARN of the voice connector resource is conveyed in the "VoiceConnectorArn" field of the " list_voice_connectors" response.

  • date

The created date for the voice connector is ret in the "CreatedTimestamp" field of the "list_voice_connectors" response.

With all the above known we can update our implementation...

@resources.register('chimesdkvoice-voiceconnector')
class VoiceConnector(QueryResourceManager):

    class resource_type(TypeInfo):
        service = "chime-sdk-voice"

        enum_spec   = [ "list_voice_connectors", "VoiceConnectors", None ]
        detail_spec = [ "get_voice_connector", "VoiceConnectorId", "VoiceConnectorId", None]

        id   = "VoiceConnectorId"
        name = "Name"
        arn  = "VoiceConnectorArn"
        date = "CreatedTimestamp"
  • cfn_type and config_type

cfn_type and config_type are required in some cases and not in others.

If the resource type that you are working with is known to AWS Config then we need to provide a value for the config_type field that provides the AWS Config name for this resource type. If the resource isn't known to AWS Config then this can be omitted entirely or set to None

If the resource type is known to AWS Cloud Formation then the AWS Cloud Formation name for this resource type must be filled in via the cfn_type config item.

So, if look for "Voice Connector" in the cloud formation resource type list and the AWS config type list then we'll find nothing, so the correct setting for these two CC config values is None.

@resources.register('chimesdkvoice-voiceconnector')
class VoiceConnector(QueryResourceManager):

    class resource_type(TypeInfo):
        service = "chime-sdk-voice"

        enum_spec   = [ "list_voice_connectors", "VoiceConnectors", None ]
        detail_spec = [ "get_voice_connector", "VoiceConnectorId", "VoiceConnectorId", None]

        id   = "VoiceConnectorId"
        name = "Name"
        arn  = "VoiceConnectorArn"
        date = "CreatedTimestamp"

        cfn_type    = None
        config_type = None
  • universal_taggable

This value controls whether Cloud Custodian will attempt to fetch the tags for the Voice Connector resources and attach these tags to the data model for the resources. The reason we probably want this tag collection to occur is to enable users of our extension to create policies that can filter resources according to these tags.

The correct value if we want automatic tag collection to occur is "object()". The full set of valid values for this config item are "True|False|object()", but I won't go into any further details other than to refer you to the documentation I placed in the TypeInfo class that explains more.

With that we csn finalise our extension impl as ...

@resources.register('chimesdkvoice-voiceconnector')
class VoiceConnector(QueryResourceManager):

    class resource_type(TypeInfo):
        service = "chime-sdk-voice"

        enum_spec   = [ "list_voice_connectors", "VoiceConnectors", None ]
        detail_spec = [ "get_voice_connector", "VoiceConnectorId", "VoiceConnectorId", None]

        id   = "VoiceConnectorId"
        name = "Name"
        arn  = "VoiceConnectorArn"
        date = "CreatedTimestamp"

        cfn_type    = None
        config_type = None

        universal_taggable = object()

More info

Take a look at any notes I've left in the TypeInfo class and also in appmesh.py and it's test class test_appmesh.py

Creating a Policy using the extension

Let's imagine that we the implementation above has been installed in CC and we want to create a Cloud Custodian policy that detects instances of VoiceConnector where encryption is turned off.

We know that we used the name aws.chimesdkvoice-voiceconnector as the identifier for your extension in resource_map.py and we need to use that ID in the CC policy file. Additionally, our research above found that the VoiceConnector specification has an encryption on/off flag called RequireEncryption. If we want to match this flag against a True/False value then we need to use a Cloud Custodian value filter (see the Cloud Custodian docs).

A policy file that brings all these factors together would look like this...

# vc_policy.yml
policies:
- name: my-voice-policy
  resource: aws.chimesdkvoice-voiceconnector
  filters:
  - type: value
    key: RequireEncryption
    op: eq
    value: False

The CC extension testing pattern is one where we use a policy to test the extension so why not use this exact policy as the basis of our test cases.

Testing

To be honest, we can't know if the configuration above is actually viable at this point and we need to do some testing to prove this.

Of course we could simply install the extension in a CC instance and manually test it, but that's not how we roll is it! We believe in writing automated regression tests.

Ideally, we'd have written the tests first - but there was a load to explain above about basic AWS and Cloud Custodian concepts and how these concepts relate to each other. So, instead we've written the implementation first to allow us to discuss those ideas along the way. When it comes to further enhancements, or a second or third resource type, then perhaps we could use TDD instead.

Designing our test case

The model information we collected above looks like this ...

'VoiceConnectorId': 'string',
'AwsRegion': 'us-east-1'|'us-west-2'|'ca-central-1'|'eu-central-1'|'eu-west-1'|'eu-west-2'|'ap-northeast-2'|'ap-northeast-1'|'ap-southeast-1'|'ap-southeast-2',
'Name': 'string',
'OutboundHostName': 'string',
'RequireEncryption': True|False,
'CreatedTimestamp': datetime(2015, 1, 1),
'UpdatedTimestamp': datetime(2015, 1, 1),
'VoiceConnectorArn': 'string'

A reasonable test case for our extension would be one that proves that

  • we can find VoiceConnectors based on the value of one of its properties above, and
  • when a match occurs then the expected full resource description is returned to the user

To achieve this test we can create a data test fixture with two instances of the voice connector resource each with a distinct specification and where one instance has encryption turned on, and the other instance turned off.

The test case must select only the instance where encryption is turned on, and the extension must return a resource spec that matches the one we've setup in the test fixture.

Setting up the test data fixture

The CC testing pattern uses a library called Placebo that replaces the real Boto API and provides a method to mock up responses to the calls your extension will make to the Boto API. To provide the mock responses you need to create a series of files with specific names that match the name of the api your extension is accessing and a sequence number 1,2,3..N that identifies the order in which these files should be returned by the mock.

Whilst it's possible to create these mock response files by hand ot would be tedious. Luckily Placebo provides a means to generate these files by running your extension against the real AWS and capturing the API responses into these files as mock data to enable the tests to run off-line.

The approach is to temporarily create instances of the relevant resources in AWS and then have CC run our test case but in a special mode where it captures the API responses.

Testing "pull" mode

Our initial test case will be as follows...

class TestAppmeshMesh(BaseTest):
  def test_appmesh(self):
     
      # standard test runner ...
      # session_factory = self.replay_flight_data('test_chimesdkvoice_voiceconnector')
  
      # test data recording runner ...
      session_factory = self.record_flight_data('test_chimesdkvoice_voiceconnector')
      
      p = self.load_policy(
          {
              "name": "my-voice-policy",
              "resource": "aws.chimesdkvoice-voiceconnector",
              'filters': [
                {
                    "type": "value",
                    "key": "RequireEncryption",
                    "op": "eq",
                    "value": "False"
                }
              ],
          },
          session_factory=session_factory,
      )

      resources = p.run()

      print(str(reources))  

We can see above that we are using the policy that we defined earlier.

Towards the top we see self.record_flight_data('test_chimesdkvoice_voiceconnector') which specifies the use of record_flight_data which will cause test execution to record the responses from AWS in a test data directory called tests/data/placebo/test_chimesdkvoice_voiceconnector.

Finally, towards the bottom we see resources = p.run() which is where we are actually running the test framework and collecting the files.

We then print out the specifications of any resources that were returned by CC.

Now go into AWS and create a pair of resources called MyVCEncrypted and MyVCUnencrypted, the first one with encryption turned on and the other with encryption turned off, and then add a tag "MayTag=MyValue" to each resource.

IF you then run the test above then we'll see the following files created in the Placebo test data directory

  • chimesdkvoice.ListVoiceConnectors_1.json
  • chimesdkvoice.GetVoiceConnector_1.json
  • chimesdkvoice.GetVoiceConnector_2.json
  • tagging.GetResources_1.json

These are the 4 calls that were made. The file name is formed from a combination of the service name, the API name and the sequence number of the call.

If we look into the files then we can see the captured responses. The "List" file will contain an array of two voice connectors and each of the "GetVoice" files will contain the complete description of one of the VoiceConnectors, and finally the "GetResources" file contains any tags that were attached by you to the test resources that you created in AWS.

Unless we added tags to the VoiceConnectors then the tagging file will be empty and in this case we need to go back into AWS and add a tag to each resource and then rerun the test. We can then check that the GetResource file now contains the tags you added. We need the resources to have tags because we want our test to verify that any tags are correctly retrieved and we'll add assertions to the test for this check.

Our test printed out the resources that were returned by the framework and we need to check this response and verify that it contains the correct values, including the tags you added.

If you are happy with the shape of the response, the full description is there and the tags too, then you should follow that pattern in test_appmesh.py and add an assertion for this response as a regression test.

Add something like this as the assertion, filling out all the fields in the data structure ...

self.assertEqual(
  [
    {'Tags': [{'Key': 'MyTag', 'Value': 'MyValue'}],
    'c7n:MatchedFilters': ['RequireEncryption'],
    'Name': 'MyVCEencrypted',        
    'VoiceConenctorArn': 'arn:aws:XXXXX TODO TODO',
    ..... etc
  ],
  resources,
)

As per test_appmesh.py the next assertion should be one for the get_arns function and if you look at the test_appmesh.py then hopefully it is self explanatory.

Finally, I recommend that you add an assertion that verifies the calls, including parameter values, that are made by your plugin. To achieve this we need to modify the test to capture the calls that are being made by CC. We can achieve this using the ApiCallCaptor utility; apply it as shonn below ...

  captor = ApiCallCaptor.start_capture()
  resources = p.run()

We can now add a final assertion to the bottom of our test case. This will make a strong assertion on the specific calls made by CC.

  self.assertEqual(
      [
          {'operation': 'ListVoiceConnectors', 'params': {}, 'service': 'chimesdkvoice'},
          {'operation': 'GetVoiceConnector', 'params': {'VoiceConnectorId': 'TODO!!!'}, 'service': 'chimesdkvoice'},
          {'operation': 'GetVoiceConnector', 'params': {'VoiceConnectorId': 'TODO!!!'}, 'service': 'chimesdkvoice'},
          {
              'operation': 'GetResources',
              'params': {
                  'ResourceARNList': [
                      'arn:aws:XXX TODO TODO',
                      'arn:aws:XXX TODO TODO',
                  ]
              },
              'service': 'resourcegroupstaggingapi',
          },
      ],
      captor.calls,
  )

Push your changes to Cloud Custodian GitHub.com

As per the Cloud Custodian developer documentation mentioned right at the top of this page, you need to setup you dev end and have these two command pass cleanly before requesting a PR back to the main branch of the product.

make lint
make test

If we've gotten this far and can run the tests and it all passes, then you are in great shape and should consider asking for a review and advice on a pull request; even if you aren't fully finished.

We will extend the test to cover "Event" mode, but there is benefit in pushing this MVP and hopefully getting some review feedback from the maintainers of CC.

Adding Event support

TODO

========================= END ========================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment