Skip to content

Instantly share code, notes, and snippets.

View anhdanggit's full-sized avatar

Anh Dang anhdanggit

View GitHub Profile
@anhdanggit
anhdanggit / Py-Athena-Conn.py
Last active April 1, 2025 00:41
Connect pyathena #aws
import pyathena
import pandas as pd
## Directly by panda
athena_conn = pyathena.connect(aws_access_key_id=os.environ['ATHENA_USER'], ##credentials of aws_access_key_id
aws_secret_access_key=os.environ['ATHENA_PASSWORD'], ##credentials of aws_secret_access_key
s3_staging_dir='s3://aws-athena-query-results-<your-details>', ##where the athena query result saved - checked in S3 ,
region_name='eu-west-1') ##the region you set for Athena
df = pd.read_sql("SELECT * FROM tutorial.wbcdata LIMIT 10", athena_conn)
@anhdanggit
anhdanggit / R-Athena-Conn.R
Created July 16, 2019 09:58
Connecting R with Athena by rJava and RJDBC
library(RJDBC)
library(rJava)
library(DBI)
# 1 - Set up driver connection to JDBC
fil = '~/<paht-to>/AthenaJDBC42_2.0.7.jar' ##path to the AthenaJDBC in your machine
drv <- JDBC(driverClass='com.simba.athena.jdbc.Driver',
classPath = fil,
identifier.quote="'")
@anhdanggit
anhdanggit / vscode-python-snippets.json
Last active July 16, 2019 10:51
Python Snippets to set-up in Visual Code
{
"Header": {
"prefix": "header",
"body": [
"'''",
"#FILE: $1",
"Project: $WORKSPACE_NAME",
"-------------------",
"By: Anh Dang",
"Date: $CURRENT_YEAR-$CURRENT_MONTH-$CURRENT_DATE",
@anhdanggit
anhdanggit / SQL-explore.sql
Last active July 16, 2019 10:51
Handy Tricks to explore DB/Athena
-- Search for a columns in DB
SELECT COLUMN_NAME AS "ColumnName",
TABLE_NAME AS "TableName",
TABLE_SCHEMA
FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME LIKE 'cashfac_id'
ORDER BY TABLE_NAME,
COLUMN_NAME;
-- List all columns one table
@anhdanggit
anhdanggit / bankstat-parse.R
Last active February 21, 2025 20:23
R Script to parsing bank account and date from bank statement
##-------------------------------------#
## BankStat Prep
## PARSE BANK ACCOUNT AND DOC DATE
## This script to parse the info of bank account iban and the doc date for bank statement
##-------------------------------------#
# --- FUNCTIONS ---------------
@anhdanggit
anhdanggit / R_GetoptLong_toQuery.r
Last active July 18, 2019 09:52
Set-up DB conn and query by GeptoptLong
library(DBI)
library(dplyr)
library(GetoptLong)
library(magrittr)
data_warehouse <- function(){DBI::dbConnect(RPostgres::Postgres()
,dbname="database"
## to hide the credentials (set it up in .Renv)
,host=Sys.getenv("HOST")
,user= Sys.getenv("USERNAME")
@anhdanggit
anhdanggit / json-extract-sql.R
Last active July 16, 2019 12:51
SQL code to extract fields from json file
## Code to extract all fields from json
sphonic_data <- tbl(connection_risk_house(), sql(
"SELECT distinct on (application_id,jkey)
application_id,
jsonb_object_keys(section_payload) as jkey,
jsonb_object_field_text(section_payload, jsonb_object_keys(section_payload)) as jvalue
FROM fraud_detection.response_sections
WHERE sent_at >= '2017-10-01'
AND sent_at < '2017-12-01'
GROUP BY application_id,section_payload
@anhdanggit
anhdanggit / xml-extract-sql.R
Created July 16, 2019 12:52
SQL via R to extract fields from XML files
data_xml <- dbGetQuery(conn = DB_conn,statement = qq('
SELECT decision_id, data,
xpath(\'//application-uuid\',data) as application_uuid,
xpath(\'//brokerage-id\',data) as brokerage_id,
xpath(\'//Transformed\',data) as transformed,
xpath(\'//pd_commercial\',data) as pd_commercial,
xpath(\'//row_pd_comb\',data) as row_pd_comb,
xpath(\'//marginal_group\',data) as marginal_group,
@anhdanggit
anhdanggit / Py-Zip-tricks.py
Last active April 1, 2025 00:40
Zip in Python #basics
## Tuple:
# Capture arbitrarily long list
values = 1,2,3,4,5
a, b, *rest = values
# Discard the rest
a, b, *_ = values
## Zip:
# "pairs" up the elements of list
seq1 = ['foo','bar','baz']
@anhdanggit
anhdanggit / Py-Dict-tricks.py
Last active April 1, 2025 00:40
Dictionaries in Python #algorithms
# Categorizing to dicts
words = ['apple', 'bat', 'bar', 'atom', 'book']
by_letters = {}
for word in words:
letter = word[0]
by_letters[letter] = by_letters.get(letter, 0) + 1
value = by_letters[letter]
print('Letter: {} - Count: {}'.format(letter, value))