Skip to content

Instantly share code, notes, and snippets.

View Gatsby-Lee's full-sized avatar
😆

Gatsby Lee Gatsby-Lee

😆
  • forethought.ai
  • SF Bay Area, United States
View GitHub Profile
@Gatsby-Lee
Gatsby-Lee / gist:fa103bae89eff35cd61e741aa631526c
Created March 3, 2024 06:23
postgresql-15-default-config-in-raspberry-pi-debian-bookworm.txt
name | setting | description
----------------------------------------+-----------------------------------------+------------------------------------------------------------------------------------------------------------------
allow_in_place_tablespaces | off | Allows tablespaces directly inside pg_tblspc, for testing.
allow_system_table_mods | off | Allows modifications of the structure of system tables.
application_name | psql | Sets the application name to be reported in statistics and logs.
archive_cleanup_command | | Sets the shell command that will be executed at every restart point.
archive_command | (
@Gatsby-Lee
Gatsby-Lee / custom_emr_image.dockerfile
Created November 30, 2023 06:50
Sample Dockerfile to create custom EMR Image
##
# @note: To improve cache re-use in image build, adding JARs goes first.
# If "python dependencies installation" ( step3, step4 ) goes first,
# the adding JARs less likely use the intermediate cache image since step3 and step4 has changes and doesn't hit cache.
#
# references
# - https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/docker-custom-images-steps.html
# - https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/docker-custom-images-tag.html
# @note as of emr-6.9.0, the public ECR can be used "public.ecr.aws/emr-on-eks/spark/emr-6.9.0:latest"
##
# filename: exp_hudi_export_0_14_0.yaml
name: exp-hudi-export-with-0-14-0
virtualClusterId: <emr-on-eks-virtual-cluster-id>
executionRoleArn: <emr-on-eks-execution-role>
# emr-6.15.0-latest has Hudi 0.14.0
# emr-6.14.0-latest has Hudi 0.13.1
# ref: https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-6.15.0.html
releaseLabel: emr-6.15.0-latest
jobDriver:
-- Table schema with JSON output format
CREATE EXTERNAL TABLE `discover_cluster_tickets_dummy_json_v1`(
`cluster_id` string,
`created_date` timestamp
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS TEXTFILE
LOCATION 's3://aws-athena/my_db/dummy_json_v1/';
-- INSERT stmt
-- It's all EXTERNAL TABLE
-- default file format: TEXTFILE
CREATE TABLE my_db.default_default (
clsuter_id string
)
LOCATION 's3://aws-athena/my_db/default_default'
CREATE TABLE my_db.parquet_default (
-- default and supported compression
-- ref: https://docs.aws.amazon.com/athena/latest/ug/compression-formats.html
-- ---------------------
-- Set ORC's compression
-- ---------------------
CREATE TABLE my_db.orc_snappy (
clsuter_id string
)
STORED AS ORC
@Gatsby-Lee
Gatsby-Lee / slack_app_post_message.py
Created July 28, 2022 06:34
slack_app_post_message.py
import logging
import os
# Import WebClient from Python SDK (github.com/slackapi/python-slack-sdk)
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError
# WebClient instantiates a client that can call API methods
# When using Bolt, you can use either `app.client` or the `client` passed to listeners.
channel_id = "#general"
# https://github.com/hashicorp/terraform-provider-google/blob/main/CHANGELOG.md
terraform {
backend "gcs" {
bucket = "wowbro_terraform_us_central_1"
}
required_providers {
google = {
source = "hashicorp/google"
version = "~> 4.22"
}
# https://github.com/hashicorp/terraform-provider-google/blob/main/CHANGELOG.md
terraform {
backend "gcs" {
bucket = "wowbro_terraform_us_central_1"
}
required_providers {
google = {
source = "hashicorp/google"
version = "~> 4.22"
}
# https://github.com/hashicorp/terraform-provider-google/blob/main/CHANGELOG.md
terraform {
backend "gcs" {
bucket = "wowbro_terraform_us_central_1"
}
required_providers {
google = {
source = "hashicorp/google"
version = "~> 4.22"
}