Skip to content

Instantly share code, notes, and snippets.

View bruno-uy's full-sized avatar

Bruno Gonzalez bruno-uy

View GitHub Profile
@bruno-uy
bruno-uy / redshift_copy_error_message.sql
Created March 15, 2024 13:52
Full message for Redshift COPY error when copying from parquet files
select s3l.message, s3l.*, sle.*
from stl_load_errors sle
left join svl_s3log s3l
on sle.query = s3l.query
order by sle.starttime desc
limit 10;
@bruno-uy
bruno-uy / print_variable_name_and_value.py
Created April 26, 2023 14:24
Print variable name and value (Python shortest version)
variable = "This is the value"
print(f"{variable=}")
# >>> variable='This is the value'
@bruno-uy
bruno-uy / session_cache_off_redshift.sql
Last active March 8, 2023 13:50
Set session cache off for Amazon Redshift
-- This setting disables the results cache, so we can see the full processing runtime each time we run the query
SET enable_result_cache_for_session TO OFF;
@bruno-uy
bruno-uy / print_current_line.py
Created December 23, 2022 12:05
Print current line of the current script
from inspect import currentframe, getframeinfo
print(getframeinfo(currentframe()).lineno) # prints 3
@bruno-uy
bruno-uy / add_schema_to_search_path.sql
Created May 27, 2022 12:29
Table not showing up when querying PG_TABLE_DEF
-- Problem: you don't see all the schemas when querying PG_TABLE_DEF
-- Solution:
-- 1. First check if the schema you're trying to query is on the search path
show search_path;
-- 2. Add the missing one(s) to the search path (imagine the result was only public and you're missing data_warehouse and matching)
set search_path to '$user', public, data_warehouse, matching; -- No matter which is your user, use '$user'
@bruno-uy
bruno-uy / git_good_practices.md
Last active October 18, 2022 14:59
Git good practices

Git good practices

  1. Write meaningful and concise commit message:
    • ❌ "Add new feature"
    • ✅ Changing X and Y because of Z
  2. Follow a pattern / convention for commit messages. You can check a good reference here.
  3. Squash commits you did for testing / adding small changes. You can check how to do that here.
  4. Separate your commits into isolated units of "atomic" changes. Examples:
    • Changes in one class / file
    • Refactor previous to the actual change you'll be doing
  • Changes in one function if the change is considerable
@bruno-uy
bruno-uy / squash_git_commits.md
Last active March 23, 2023 02:37
Squash git commits into one before pushing to origin

Squash commits

Definition: combine multiple commits into one. More related to get tidy commits than a technical problem about not doing that.

You need to first figure out how many commits do you have to squash. To check that you can use:

git log

Imagine you wanna combine the last 3 commits into one. You'll do a soft reset from HEAD minus 3 commits:

@bruno-uy
bruno-uy / set_pandas_display_options.py
Last active July 7, 2023 13:10
Set pandas display options
import pandas as pd
pd.set_option("display.max_rows", 500)
pd.set_option("display.max_columns", 500)
pd.set_option("display.width", 1000)
pd.set_option("display.max_colwidth", None)
@bruno-uy
bruno-uy / df_to_dict_with_none.py
Created December 31, 2021 11:04
Export pandas DataFrame to a dict with None instead of nan
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": [1, 2, 3], "B": [1.2, np.NaN, 3.4]})
result = (
df
.replace([np.nan], [None], regex=False)
.to_dict(orient="records")
)
@bruno-uy
bruno-uy / read_all_csv_gz_current_folder.py
Created December 31, 2021 10:57
Read all csv.gz files from the current folder in a pandas DataFrame
import pandas as pd
df = pd.concat([pd.read_csv(f, compression="gzip") for f in os.listdir() if f.endswith(".gz")], ignore_index=True)