Skip to content

Instantly share code, notes, and snippets.

@kichik
Last active February 9, 2024 15:31
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save kichik/7a2ecb0d36358c50c7b878ad9fd982bc to your computer and use it in GitHub Desktop.
Save kichik/7a2ecb0d36358c50c7b878ad9fd982bc to your computer and use it in GitHub Desktop.
CloudFormation template that stops RDS from automatically starting back up
# aws cloudformation deploy --template-file KeepDbStopped.yml --stack-name stop-db --capabilities CAPABILITY_IAM --parameter-overrides DB=arn:aws:rds:us-east-1:XXX:db:XXX
Description: Automatically stop RDS instance every time it turns on due to exceeding the maximum allowed time being stopped
Parameters:
DB:
Description: ARN of database that needs to be stopped
Type: String
AllowedPattern: arn:aws:rds:[a-z0-9\-]+:[0-9]+:db:[^:]*
MaxStartupTime:
Description: Maximum number of minutes to wait between database is automatically started and the time it's ready to be shut down. Extend this limit if your database takes a long time to boot up.
Type: Number
MinValue: 10
Default: 25
Resources:
DatabaseStopperFunction:
Type: AWS::Lambda::Function
Properties:
Role: !GetAtt DatabaseStopperRole.Arn
Runtime: python3.6
Handler: index.handler
Timeout: 20
Code:
ZipFile:
Fn::Sub: |
import boto3
import time
def handler(event, context):
print("got", event)
db = event["detail"]["SourceArn"]
id = event["detail"]["SourceIdentifier"]
message = event["detail"]["Message"]
region = event["region"]
rds = boto3.client("rds", region_name=region)
if message == "DB instance is being started due to it exceeding the maximum allowed time being stopped.":
print("database turned on automatically, setting last seen tag...")
last_seen = int(time.time())
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": str(last_seen)}])
elif message == "DB instance started":
print("database started (and sort of available?)")
last_seen = 0
for t in rds.list_tags_for_resource(ResourceName=db)["TagList"]:
if t["Key"] == "DbStopperLastSeen":
last_seen = int(t["Value"])
if time.time() < last_seen + (60 * ${MaxStartupTime}):
print("database was automatically started in the last ${MaxStartupTime} minutes, turning off...")
time.sleep(10) # even waiting for the "started" event is not enough, so add some wait
rds.stop_db_instance(DBInstanceIdentifier=id)
print("success! removing auto-start tag...")
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": "0"}])
else:
print("ignoring manual database start")
else:
print("error: unknown database event!")
DatabaseStopperRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- sts:AssumeRole
Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: Notify
PolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- rds:StopDBInstance
Effect: Allow
Resource: !Ref DB
- Action:
- rds:AddTagsToResource
- rds:ListTagsForResource
- rds:RemoveTagsFromResource
Effect: Allow
Resource: !Ref DB
Condition:
ForAllValues:StringEquals:
aws:TagKeys:
- DbStopperLastSeen
DatabaseStopperPermission:
Type: AWS::Lambda::Permission
Properties:
Action: lambda:InvokeFunction
FunctionName: !GetAtt DatabaseStopperFunction.Arn
Principal: events.amazonaws.com
SourceArn: !GetAtt DatabaseStopperRule.Arn
DatabaseStopperRule:
Type: AWS::Events::Rule
Properties:
EventPattern:
source:
- aws.rds
detail-type:
- "RDS DB Instance Event"
resources:
- !Ref DB
detail:
Message:
- "DB instance is being started due to it exceeding the maximum allowed time being stopped."
- "DB instance started"
Targets:
- Arn: !GetAtt DatabaseStopperFunction.Arn
Id: DatabaseStopperLambda
@Klohto
Copy link

Klohto commented Apr 6, 2020

detail: Message: - "The DB instance is being started due to it exceeding the maximum allowed time being stopped." # TODO something is off about the pattern as this never gets triggered

It's due to a wrong pattern. Omit The and it will work.
The correct pattern is: DB instance is being started due to it exceeding the maximum allowed time being stopped.

Works for me :)

@kichik
Copy link
Author

kichik commented Apr 6, 2020

@Klohto doh, of course! Thanks, updated.

@cagriar
Copy link

cagriar commented Apr 17, 2020

Hi, I have successfully deployed this to cloud formation but unfortunately it did not stop my RDS instance after 7 days of sleeping. I have only changed the stack-name and rds arn parameters but the rest is the same.

I have checked the logs. The db stopper lambda executed 3 times between 6:18pm to 6:22pm. But the database was auto-started at 6:24pm, which is 2 minutes later. Therefore, the logs for the lambda is such:

An error occurred (InvalidDBInstanceState) when calling the StopDBInstance operation: Instance my-tmp-db is not in available state.: InvalidDBInstanceStateFault

The CloudFormation executed the lambda right before the RDS instance is fully started and entered "available" state. Have you ever encountered such a problem? Is there any way to be able to execute this lambda right after the RDS state is changed to "available"? And do you know why this lambda is executed for 3 times?

@kichik
Copy link
Author

kichik commented Apr 17, 2020

@cagriar when Lambdas fail, they try again 2 more times for a total of 3 times. I'm afraid you've hit the TODO on line 24. One very naive solution would be setting Timeout: 900 on the Lambda and adding time.sleep(600) in the handler.

But maybe I can do one better. Can you paste the recent events of your RDS? Should be AWS Console -> RDS -> Databases -> [your database] -> Logs & events -> Recent events.

@cagriar
Copy link

cagriar commented Apr 20, 2020

@kichik I have modified the DatabaseStopperRule such as:

  DatabaseStopperRule:
    Type: AWS::Events::Rule
    Properties:
      EventPattern:
        source:
          - aws.rds
        detail-type:
          - "RDS DB Instance Event"
        resources:
          - !Ref DB
        detail:
          EventCategories:
            - "availability"
            - "notification"
      Targets:
        - Arn: !GetAtt DatabaseStopperFunction.Arn
          Id: DatabaseStopperLambda

When I have manually start my database, this works successfully by stopping it. Now I'm waiting for 7 days so that I can test it with the auto-start mechanism of AWS. I have one other question, where did you configure so that the lambdas are executed 2 more times if it fail? I could not find that in your script. Is it the general behavior?

@kichik
Copy link
Author

kichik commented Apr 20, 2020

I want to have it only shutdown the database if it was turned on automatically. This way users can still turn it on manually. So I want to catch both events and only respond to the second event saying the database was started if the first event saying it was because the 7 days limit happened. For that it would be really useful if you can paste the whole event log once the 7 days pass.

Lambda executing three times is an internal Lambda feature you can't turn off.

@cagriar
Copy link

cagriar commented Apr 21, 2020

OK I have changed the DatabaseStopperRule such as:

  DatabaseStopperRule:
    Type: AWS::Events::Rule
    Properties:
      EventPattern:
        source:
          - aws.rds
        detail-type:
          - "RDS DB Instance Event"
        resources:
          - !Ref DB
        detail:
          EventCategories:
            - "notification"
      Targets:
        - Arn: !GetAtt DatabaseStopperFunction.Arn
          Id: DatabaseStopperLambda

Now it just listens to the notification events, hence will only stopped if it's started automatically. I don't know why but my RDS logs are empty. I have placed print(event) inside my lambda, so that I will send you the logs on friday, once the DB is auto-started after 7 days.

@cagriar
Copy link

cagriar commented Apr 24, 2020

@kichik The modification I made (adding notification as EventCategories) in the last message seems working. My lambda function was called 3*3=9 times with 3 separate events, which are:

  1. 'Message': 'DB instance is being started due to it exceeding the maximum allowed time being stopped.'
  2. 'Message': 'DB instance started' (lambda stops the started database in the 3rd try)
  3. 'Message': 'DB instance stopped'

So I guess it works. Thanks.

@kichik
Copy link
Author

kichik commented Apr 24, 2020

Thanks. I updated it to wait for the "maximum allowed" message, set a tag, wait for the "started" message, see if the tag was set, and only then stop it after waiting 10 seconds. Hopefully this should be enough to cover all the cases.

@cagriar
Copy link

cagriar commented Apr 25, 2020

@kichik Thanks for the update. I have 2 questions:

  1. Why do you need to check whether the database was automatically started in the last 20 minutes or not? Did you do that to separate the logic between auto-started database and manually-started database?
  2. Are you sure 10 seconds sleep is enough? If it fails, the lambda function will be executed 2 more times again as usual, right?

@kichik
Copy link
Author

kichik commented Apr 26, 2020

Yes, this is to allow people to manually turn on the database when they do need it without deleting this stack.

10 seconds worked in my limited tests. It will try 3 times so technically a bit more than 30 seconds total. We can add a configurable value if people report it doesn't work.

@cagriar
Copy link

cagriar commented Apr 27, 2020

@kichik Thanks for that! I will try it and let you know with the results next week.

@cagriar
Copy link

cagriar commented May 4, 2020

@kichik Thanks, your updates worked successfully!

@derianpt
Copy link

Hi, cool script! Just some thoughts

@kichik @cagriar Have you guys ever come across an RDS instance taking longer than 20 minutes to start (after AWS auto starts it)? Say 25 minutes.

In that case, this script will falsely exclude it from being stopped, right?

I was thinking maybe these lines:
https://gist.github.com/kichik/7a2ecb0d36358c50c7b878ad9fd982bc#file-keepdbstopped-yml-L43-L52

can be modified to:

tags = rds.list_tags_for_resource(ResourceName=db)["TagList"]

if is_started_by_aws(tags):
    print("database was automatically started, turning off...")
    time.sleep(10)
    # even waiting for the "started" event is not enough, so add some wait
    rds.stop_db_instance(DBInstanceIdentifier=id)

    print("success! removing auto-start tag...")
    rds.remove_tags_from_resource(ResourceName=db, TagKeys=["DbStopperLastSeen"])
else:
    print("ignoring manual database start")


def is_started_by_aws(tags):
    """
    Checks if a RDS instance was auto started by AWS
    :param tags: List of Tags configured for RDS instance
    :return: False if the resource has "DbStopperLastSeen" tag, True otherwise
    """
    for tag in tags:
        if tag["Key"].lower() == "DbStopperLastSeen".lower():
            return True

    return False

This way:

  1. It won't matter how long the RDS instance takes to be available.
  2. We don't need to use the time.time() UNIX timestamp value stored in last seen tag. It will just be an identifier added by the earlier lambda run.

Thoughts?

@kichik
Copy link
Author

kichik commented Oct 18, 2020

I wanted to put a time limit on it in case we ever miss the message, the user turns off the DB before we do, the tag fails to set, or basically any unforeseen issue. I'll turn it into a stack parameter, but I still want to keep it in place.

@regevbr
Copy link

regevbr commented Jan 24, 2021

Please notice that for Aurora this will not work and you need to operate on the cluster level.
@kichik thoughts?
Here is the modified (not yet tested) version:

# aws cloudformation deploy --template-file KeepDbStopped.yml --stack-name stop-db --capabilities CAPABILITY_IAM --parameter-overrides DBCluster=arn:aws:rds:us-east-1:XXX:cluster:XXX
Description: Automatically stop RDS Aurora cluster every time it turns on due to exceeding the maximum allowed time being stopped
Parameters:
  DBCluster:
    Description: ARN of database cluster that needs to be stopped
    Type: String
    AllowedPattern: arn:aws:rds:[a-z0-9\-]+:[0-9]+:cluster:[^:]*
  MaxStartupTime:
    Description: Maximum number of minutes to wait between database is automatically started and the time it's ready to be shut down. Extend this limit if your database takes a long time to boot up.
    Type: Number
    MinValue: 10
    Default: 25
Resources:
  DatabaseStopperFunction:
    Type: AWS::Lambda::Function
    Properties:
      Role: !GetAtt DatabaseStopperRole.Arn
      Runtime: python3.6
      Handler: index.handler
      Timeout: 20
      Code:
        ZipFile:
          Fn::Sub: |
            import boto3
            import time

            def handler(event, context):
              print("got", event)
              db = event["detail"]["SourceArn"]
              id = event["detail"]["SourceIdentifier"]
              message = event["detail"]["Message"]
              region = event["region"]
              rds = boto3.client("rds", region_name=region)

              if message == "Cluster instance is being started due to it exceeding the maximum allowed time being stopped.":
                print("database turned on automatically, setting last seen tag...")
                last_seen = int(time.time())
                rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": str(last_seen)}])

              elif message == "Cluster instance started":
                print("database started (and sort of available?)")

                last_seen = 0
                for t in rds.list_tags_for_resource(ResourceName=db)["TagList"]:
                  if t["Key"] == "DbStopperLastSeen":
                    last_seen = int(t["Value"])

                if time.time() < last_seen + (60 * ${MaxStartupTime}):
                  print("database was automatically started in the last ${MaxStartupTime} minutes, turning off...")
                  time.sleep(10)  # even waiting for the "started" event is not enough, so add some wait
                  rds.stop_db_cluster(DBClusterIdentifier=id)

                  print("success! removing auto-start tag...")
                  rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": "0"}])

                else:
                  print("ignoring manual database start")

              else:
                print("error: unknown database event!")
  DatabaseStopperRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Action:
              - sts:AssumeRole
            Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: Notify
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Action:
                  - rds:StopDBCluster
                Effect: Allow
                Resource: !Ref DBCluster
              - Action:
                  - rds:AddTagsToResource
                  - rds:ListTagsForResource
                  - rds:RemoveTagsFromResource
                Effect: Allow
                Resource: !Ref DBCluster
                Condition:
                  ForAllValues:StringEquals:
                    aws:TagKeys:
                      - DbStopperLastSeen
  DatabaseStopperPermission:
    Type: AWS::Lambda::Permission
    Properties:
      Action: lambda:InvokeFunction
      FunctionName: !GetAtt DatabaseStopperFunction.Arn
      Principal: events.amazonaws.com
      SourceArn: !GetAtt DatabaseStopperRule.Arn
  DatabaseStopperRule:
    Type: AWS::Events::Rule
    Properties:
      EventPattern:
        source:
          - aws.rds
        detail-type:
          - "RDS Cluster Instance Event"
        resources:
          - !Ref DBCluster
        detail:
          Message:
            - "Cluster instance is being started due to it exceeding the maximum allowed time being stopped."
            - "Cluster instance started"
      Targets:
        - Arn: !GetAtt DatabaseStopperFunction.Arn
          Id: DatabaseStopperLambda

@kichik
Copy link
Author

kichik commented Jan 24, 2021

@regevbr your code looks like it would work. Thanks! I think using Aurora Serverless might work too.

@67lisbon
Copy link

67lisbon commented Feb 17, 2022

@kichik it doesnt work for me when I run a test event in Lambda. I get this error

17 Feb 2022 14:09 [INFO] (/var/runtime/bootstrap.py) main started at epoch 1645106978694
17 Feb 2022 14:09 [INFO] (/var/runtime/bootstrap.py) init completed at epoch 1645106978694
got {'key1: 'value1', 'key2': 'value2', 'key3': 'value3'}
'SourceArn' : KeyError
Traceback (most recent call last):
File "/var/task/index.py", line 6, in handler
db = event["SourceArn"]
KeyError: 'SourceArn'

This also happens for ["detail"] in db = event["detail"]["SourceArn"]

I have only run this through Lambda as a 'Test' I configured on the Lambda. I have not tested this yet by using the Event Rule that listens for the message.

In the AWS cli if I run the command rds describe-events against my RDS Cluster I can see the following under 'Events' SourceIdentifier, SourceType, SourceArn and Message

@Klohto
Copy link

Klohto commented Feb 17, 2022

Run it against actual event test data.
Do you see {'key1: 'value1', 'key2': 'value2', 'key3': 'value3'} containing SourceArn?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment