Skip to content

Instantly share code, notes, and snippets.

@michelmilezzi
michelmilezzi / aws_glue_avoiding_duplicates.py
Last active August 29, 2023 22:08
AWS Glue script showing how to avoid duplicates during a job execution.
import sys
import pydevd
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from pyspark.sql.functions import col
from awsglue.job import Job
@michelmilezzi
michelmilezzi / DataCopyManager.java
Created November 28, 2017 15:52
A simple java class to get PostgreSQL COPY content from STDOUT to compressed file
package br.com.michelmilezzi.database.data;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.sql.Connection;
import java.sql.SQLException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
@michelmilezzi
michelmilezzi / parameter_bind.rb
Last active September 18, 2019 17:13
PostgreSQL bind parameters
# Usage: ruby parameter_bind.rb mylogfile.log
# Output: every sql command parsed with its parameters
sourceFileName = ARGV[0]
targetFile = File.open("#{sourceFileName}.sql", 'w')
lines = File.read(sourceFileName, encoding: 'utf-8').scan(/(duration:.*)\sexecute.+:\s(.+)(\n.+parameters:(.+))?/)
lines.each do |line|
if line[2] != nil then
@michelmilezzi
michelmilezzi / idempotent_script.sql
Last active December 6, 2024 16:20
A simple script demonstrating PostgreSQL idempotent capabilities. You can run this script as many times as you wish (it will not give duplicate object error or similar).
--Table
CREATE TABLE IF NOT EXISTS person (
id integer NOT NULL,
person_name character varying(40) NOT NULL,
updated_date date,
CONSTRAINT person_pkey PRIMARY KEY (id)
);
--Index
CREATE INDEX IF NOT EXISTS idx_person_name ON person (person_name);
@michelmilezzi
michelmilezzi / overall_status_of_indexes.sql
Last active June 26, 2017 19:43
PostgreSQL query that calculates overall status of indexes
SELECT
COUNT(*) FILTER (WHERE idx_scan = 0) AS no_hit,
COUNT(*) FILTER (WHERE idx_scan > 0) AS used,
COUNT(*) AS total
FROM
pg_stat_user_indexes s
JOIN pg_index i ON (i.indexrelid = s.indexrelid)
WHERE
NOT i.indisprimary;
@michelmilezzi
michelmilezzi / index_without_hit.sql
Last active October 25, 2022 01:26
PostgreSQL query that finds indexes without hits
SELECT
relid::regclass AS table,
indexrelid::regclass AS index,
pg_size_pretty(pg_relation_size(indexrelid::regclass)) AS index_size,
idx_tup_read,
idx_tup_fetch,
idx_scan
FROM
pg_stat_user_indexes
JOIN pg_index USING (indexrelid)
@michelmilezzi
michelmilezzi / find_contraint_ref_fields.sql
Last active June 26, 2017 19:44
PostgreSQL query that finds columns and referenced columns from a constraint
WITH constraint_ref_fields AS (
SELECT
unnest(con1.conkey) AS local_key,
unnest(con1.confkey) AS ref_key,
con1.conrelid,
con1.confrelid
FROM
pg_class cl
JOIN pg_namespace ns ON cl.relnamespace = ns.oid
JOIN pg_constraint con1 ON con1.conrelid = cl.oid