Skip to content

Instantly share code, notes, and snippets.

@dharamsk
dharamsk / debug_avro_schemas.py
Created April 14, 2018 00:33
Avro Debugger Script
import random
import argparse
import mock
from where_ever.stream_handler import *
from where_ever.tests.test_stream_handler import *
"""
This will check an avro schema against an example data record
@dharamsk
dharamsk / alter_table_attributes.py
Created October 17, 2018 05:40
Redshift: Programmatically configure dist style and sort key of *existing* tables
# I wrote this script to alter 700 redshift tables to diststyle all (and remove sort keys)
# but it is partially setup to specify any dist style and sort key on a table by table basis
# all that's needed is to modify the main() function to accept a dict with the table configs
# Redshift clusters with large node types will waste disk space and network bandwidth
# when small tables use EVEN or DISTKEY dist styles
# sort keys will double the minimum size of a table, also wasting space
# see here for more on minimum table size calculation:
# https://aws.amazon.com/premiumsupport/knowledge-center/redshift-cluster-storage-space/
@dharamsk
dharamsk / bigquery_relax_schema_on_all_tables.py
Created April 3, 2020 18:41
This python snippet was written to modify all schemas in a dataset to "relax" all columns that were REQUIRED to be NULLABLE. In this case, I applied it only to table that were modified in the last 24 hours, however this could be modified to do other useful operations on all tables in a dataset.
# for all tables modified in the last 24 hours
# relax all columns to be NULLABLE instead of REQUIRED
# Python3
from google.cloud import bigquery
from datetime import datetime, timedelta
CLIENT = bigquery.Client() # auth using default credentials/project
DATASET = 'your_dataset'
@dharamsk
dharamsk / tf_big_plans.py
Created April 23, 2020 20:38
Display relevant terraform policy diffs, omitting redundant items
#!/usr/bin/env python3
"""
This python script improves the usability of terraform 0.12 by
eliminating the display of redundant changes, typically found in
resource attributes like policy/policy_data for AWS/GCP providers.
This is a known limitation with the legacy terraform SDK, as described here:
https://github.com/hashicorp/terraform/issues/21901
@dharamsk
dharamsk / tf_big_plans.py
Created April 23, 2020 20:38
Display relevant terraform policy diffs, omitting redundant items
#!/usr/bin/env python3
"""
This python script improves the usability of terraform 0.12 by
eliminating the display of redundant changes, typically found in
resource attributes like policy/policy_data for AWS/GCP providers.
This is a known limitation with the legacy terraform SDK, as described here:
https://github.com/hashicorp/terraform/issues/21901