Skip to content

Instantly share code, notes, and snippets.

@RupGautam
Created April 17, 2023 21:00
Show Gist options
  • Save RupGautam/3f28205cc971cee5ff07164790701421 to your computer and use it in GitHub Desktop.
Save RupGautam/3f28205cc971cee5ff07164790701421 to your computer and use it in GitHub Desktop.

Quick setup

  • Setup project for this
  • virtualenv env
  • source env/bin/activate
  • pip install boto3 pyyaml datadog

Create DynamoDB tables

aws dynamodb create-table \          
    --table-name uploaded-files \
    --attribute-definitions AttributeName=file_name,AttributeType=S \
    --key-schema AttributeName=file_name,KeyType=HASH \
    --billing-mode PAY_PER_REQUEST

Run the app python main.py

Checking the Datadog metrics (Do this on query data box)

sum:s3.upload.success by {1 weeek}

sum:s3.upload.failed by {<1 week>}

import os
import boto3
import yaml
import threading
import logging
from datadog import api
from boto3.dynamodb.conditions import Key
# Set up the logging configuration
logging.basicConfig(filename='upload.log', level=logging.INFO, format='%(asctime)s %(levelname)s %(message)s')
# REF: https://docs.datadoghq.com/metrics/custom_metrics/
# init DD
api_key = os.environ.get('DATADOG_API_KEY')
app_key = os.environ.get('DATADOG_APP_KEY')
datadog.initialize(api_key=api_key, app_key=app_key)
# Define the name of your DynamoDB table
TABLE_NAME = "uploaded-files"
# Connect to DynamoDB
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(TABLE_NAME)
def upload_file(bucket, path, key):
s3 = boto3.client('s3', region_name='us-east-1')
try:
s3.upload_file(path, bucket, key)
logging.info(f'Successfully uploaded {key}')
# send the upload success events to DD api as custom metric
api.Metric.send(metric='s3.upload.success')
# Add the uploaded file name to DynamoDB
table.put_item(Item={"Filename": key})
except Exception as e:
logging.error(f'Failed to upload {key}: {str(e)}')
# send the upload failed events to DD api as custom metric
api.Metric.send(metric='s3.upload.failed')
# Load the YAML file
with open('files.yaml', 'r') as f:
files = yaml.safe_load(f)['files']
# Get the list of files that have already been uploaded from DynamoDB
response = table.scan(ProjectionExpression="Filename")
uploaded_files = set(item['Filename'] for item in response['Items'])
# Identify the net-new files that need to be uploaded
net_new_files = [file for file in files if file['name'] not in uploaded_files]
# Create a thread for each net-new file and start uploading
threads = []
for file in net_new_files:
name = file['name']
path = file['path']
thread = threading.Thread(target=upload_file, args=('fueled-fun', path, name))
threads.append(thread)
thread.start()
# Wait for all threads to finish
for thread in threads:
thread.join()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment