Skip to content

Instantly share code, notes, and snippets.

View vaquarkhan's full-sized avatar
:octocat:
while( !(succeed=try())){}

Vaquar Khan vaquarkhan

:octocat:
while( !(succeed=try())){}
View GitHub Profile

AWS Resource-Based Policy Examples

@vaquarkhan
vaquarkhan / lambda_handler.py
Created March 14, 2024 20:07 — forked from djg07/lambda_handler.py
DynamoDB Streams Lambda Handler
import json
print('Loading function')
def lambda_handler(event, context):
print('------------------------')
print(event)
#1. Iterate over each record
try:
for record in event['Records']:
@vaquarkhan
vaquarkhan / queries.sql
Created September 26, 2023 15:21 — forked from iconara/queries.sql
Low level Redshift cheat sheet
-- Table information like sortkeys, unsorted percentage
-- see http://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html
SELECT * FROM svv_table_info;
-- Table sizes in GB
SELECT t.name, COUNT(tbl) / 1000.0 AS gb
FROM (
SELECT DISTINCT datname, id, name
FROM stv_tbl_perm
JOIN pg_database ON pg_database.oid = db_id
@vaquarkhan
vaquarkhan / CognitoAuthenticationProvider.java
Created July 30, 2023 07:44 — forked from alexramos1/CognitoAuthenticationProvider.java
Simplest possible implementation of AWS Cognito username/password authentication on Spring Security.
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Base64;
import java.util.HashMap;
import java.util.Map;
import javax.annotation.Nonnull;
import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
@vaquarkhan
vaquarkhan / compare_two_s3_dataframes.py
Created March 2, 2023 19:07 — forked from mappingvermont/compare_two_s3_dataframes.py
Use hadoop to compare two data tables on s3, write out differences
import os
import subprocess
import sys
os.environ["SPARK_HOME"] = r"/usr/lib/spark"
# Set PYTHONPATH for Spark
for path in [r'/usr/lib/spark/python/', r'/usr/lib/spark/python/lib/py4j-src.zip']:
sys.path.append(path)
@vaquarkhan
vaquarkhan / read_write_pyspark_redshift.py
Created December 9, 2022 04:50 — forked from mlivingston40/read_write_pyspark_redshift.py
Basics set up on Read and Write with Redshift in PySpark Env
# Configuration needed of jars
%%configure
{
"conf": {
"spark.jars": "https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.36.1060/RedshiftJDBC42-no-awssdk-1.2.36.1060.jar",
"spark.jars.packages": "org.apache.spark:spark-avro_2.11:2.4.2,io.github.spark-redshift-community:spark-redshift_2.11:4.0.1"
}
}
# define redshift connection info
CREATE EXTERNAL TABLE IF NOT EXISTS gdelt.events (
`globaleventid` INT,`day` INT,`monthyear` INT,`year` INT,`fractiondate` FLOAT,
`actor1code` string,`actor1name` string,`actor1countrycode` string,`actor1knowngroupcode` string,
`actor1ethniccode` string,`actor1religion1code` string,`actor1religion2code` string,
`actor1type1code` string,`actor1type2code` string,`actor1type3code` string,
`actor2code` string,`actor2name` string,`actor2countrycode` string,`actor2knowngroupcode` string,
`actor2ethniccode` string,`actor2religion1code` string,`actor2religion2code` string,
`actor2type1code` string,`actor2type2code` string,`actor2type3code` string,
`isrootevent` BOOLEAN,`eventcode` string,`eventbasecode` string,`eventrootcode` string,
`quadclass` INT,`goldsteinscale` FLOAT,`nummentions` INT,`numsources` INT,`numarticles` INT,`avgtone` FLOAT,
@vaquarkhan
vaquarkhan / spark-to-sql-validation-sample.py
Created November 13, 2022 07:57 — forked from dennyglee/spark-to-sql-validation-sample.py
Validate Spark DataFrame data and schema prior to loading into SQL
'''
Example Schema Validation
Assumes the DataFrame `df` is already populated with schema:
{id : int, day_cd : 8-digit code representing date, category : varchar(24), type : varchar(10), ind : varchar(1), purchase_amt : decimal(18,6) }
Runs various checks to ensure data is valid (e.g. no NULL id and day_cd fields) and schema is valid (e.g. [category] cannot be larger than varchar(24))
'''
@vaquarkhan
vaquarkhan / economic_events_update_dag.py
Created October 30, 2022 02:07 — forked from cr3a7ure/economic_events_update_dag.py
Airflow DAG definition file to dynamically generate DAGs based on a variable (pull economic data when it is released)
#/usr/bin/python3
# -*- coding: utf-8 -*-
import logging
import airflow
from airflow import DAG
from datetime import timedelta, datetime
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator
from airflow.operators.http_operator import SimpleHttpOperator