Skip to content

Instantly share code, notes, and snippets.

View pinei's full-sized avatar
🏠
Working from home

Pìnei pinei

🏠
Working from home
  • Petrobras
  • Rio de Janeiro, Brazil
View GitHub Profile
@mniehoff
mniehoff / DuckDB_in_Databricks.ipynb
Last active October 13, 2025 12:24
How to use DuckDB in Databricks to process data stored in Databricks Unity Catalog. https://www.codecentric.de/wissens-hub/blog/access-databricks-unitycatalog-from-duckdb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@turicas
turicas / Transcrição de textos em Português com whisper (OpenAI).ipynb
Last active September 10, 2025 18:40
Transcrição de textos em Português com whisper (OpenAI)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@machuu
machuu / WSL2_VPN_Workaround_Instructions.md
Last active July 19, 2025 14:04
Workaround for WSL2 network broken on VPN

Overview

Internet connection and DNS routing are broken from WSL2 instances, when some VPNs are active.

The root cause seems to be that WSL2 and the VPN use the same IP address block, and the VPN routing clobbers WSL2's network routing.

This problem is tracked in multiple microsoft/WSL issues including, but not limited to:

@ihor-lev
ihor-lev / sublime_fix_home_end_keys.md
Last active September 26, 2025 12:16
Fix Sublime Text Home and End key usage on Mac OSX

Sublime Text Home/End keys default functionality jumps to the beginning and end of the file.

Fix Home and End keys to move the cursor to the beginning and end of lines.

Preferences > Key Bindings - User

Adding the following to the array:

{ "keys": ["home"], "command": "move_to", "args": {"to": "bol"} },
{ "keys": ["end"], "command": "move_to", "args": {"to": "eol"} },
@NigelEarle
NigelEarle / Knex-Migrations-Seeding.md
Last active October 15, 2025 20:30
Migration and seeding instructions using Knex.js!

Migrations & Seeding

What are migrations??

Migrations are a way to make database changes or updates, like creating or dropping tables, as well as updating a table with new columns with constraints via generated scripts. We can build these scripts via the command line using knex command line tool.

To learn more about migrations, check out this article on the different types of database migrations!

Creating/Dropping Tables

@dusenberrymw
dusenberrymw / spark_tips_and_tricks.md
Last active January 10, 2025 07:36
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
@evenv
evenv / Spark Dataframe Cheat Sheet.py
Last active August 3, 2025 19:50
Cheat sheet for Spark Dataframes (using Python)
# A simple cheat sheet of Spark Dataframe syntax
# Current for Spark 1.6.1
# import statements
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark.sql.functions import *
#creating dataframes
df = sqlContext.createDataFrame([(1, 4), (2, 5), (3, 6)], ["A", "B"]) # from manual data