Skip to content

Instantly share code, notes, and snippets.

@mniehoff
mniehoff / databricks_python_library_job_executor.py
Created November 25, 2021 15:57
Example Script to run code that is in python package. Can be used to run Python Packages as Python job on databricks. Use this script as "python_file" and pass the job_module and the job_args as "parameter" like this: [ "--job-module=package.subpackage.module", "--job-method=run_job" ]
import importlib
import argparse
def get_installed_packages():
import pkg_resources
installed_packages = pkg_resources.working_set
return sorted(["%s==%s" % (i.key, i.version) for i in installed_packages])
@zaloogarcia
zaloogarcia / pandas_to_spark.py
Last active April 19, 2022 16:20
Script for converting Pandas DF to Spark's DF
from pyspark.sql.types import *
# Auxiliar functions
# Pandas Types -> Sparks Types
def equivalent_type(f):
if f == 'datetime64[ns]': return DateType()
elif f == 'int64': return LongType()
elif f == 'int32': return IntegerType()
elif f == 'float64': return FloatType()
else: return StringType()
@noelbundick
noelbundick / Dockerfile
Last active May 14, 2024 15:20
Consuming packages from a private Azure Pipelines Python artifact feed
# We set an environment variable in this phase so it gets picked up by pip, but we don't want to bake secrets into our container image
FROM python:3.6-alpine AS builder
ARG INDEX_URL
ENV PIP_EXTRA_INDEX_URL=$INDEX_URL
COPY requirements.txt .
RUN pip install -U pip \
&& pip install --user -r requirements.txt
@hellerbarde
hellerbarde / latency.markdown
Created May 31, 2012 13:16 — forked from jboner/latency.txt
Latency numbers every programmer should know

Latency numbers every programmer should know

L1 cache reference ......................... 0.5 ns
Branch mispredict ............................ 5 ns
L2 cache reference ........................... 7 ns
Mutex lock/unlock ........................... 25 ns
Main memory reference ...................... 100 ns             
Compress 1K bytes with Zippy ............. 3,000 ns  =   3 µs
Send 2K bytes over 1 Gbps network ....... 20,000 ns  =  20 µs
SSD random read ........................ 150,000 ns  = 150 µs

Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs