Skip to content

Instantly share code, notes, and snippets.

View bvsubhash's full-sized avatar

Subhash Burramsetty bvsubhash

View GitHub Profile
import boto3
import pandas as pd
import s3fs
import time
GLUE_DATABASE = 'sample_db'
ATHENA_S3_OUTPUT_PATH = 's3://athena-query-results-bucket/etl-queries-temp-output-folder'
athena_query = '''SELECT * FROM table_name WHERE column_1="value1" AND column_2="value2"'''
@bvsubhash
bvsubhash / comparison.csv
Last active May 28, 2020 05:38
Table comparing the multiple methods on adding paritions into Glue table for Firehose Delivery into S3 use case
Method 1 (Glue Crawler) Method 2 (MSCK Repair) Method 3 (Alter Table Command) Method 4 (Boto3 SDK)
Costly Yes Free Free Very less
Time Taken Minutes Minutes Seconds Less than 2 seconds
Schema Change detection Yes No No No
Limitations(Current use case) - Athena Service Quota Limits Athena Service Quota Limits -
@bvsubhash
bvsubhash / create_glue_partition.py
Last active July 12, 2022 00:14
Code Snippet of Lambda Function to add partitions into Glue Metadata Catalog using Boto3
import boto3
import urllib.parse
import os
import copy
def create_glue_partition_handler(event, context):
for record in event['Records']:
try:
source_bucket = record['s3']['bucket']['name']