Skip to content

Instantly share code, notes, and snippets.

View 30lm32's full-sized avatar
😶‍🌫️
Out sick

30lm32

😶‍🌫️
Out sick
  • Micrososft
View GitHub Profile
@zaloogarcia
zaloogarcia / pandas_to_spark.py
Last active April 19, 2022 16:20
Script for converting Pandas DF to Spark's DF
from pyspark.sql.types import *
# Auxiliar functions
# Pandas Types -> Sparks Types
def equivalent_type(f):
if f == 'datetime64[ns]': return DateType()
elif f == 'int64': return LongType()
elif f == 'int32': return IntegerType()
elif f == 'float64': return FloatType()
else: return StringType()
@lucasjellema
lucasjellema / docker-compose.yml
Created May 22, 2018 04:57
Docker Compose file for Apache Kafka, the Confluent Platform (4.1.0) - with Kafka Connect, Kafka Manager, Schema Registry and KSQL (1.0) - assuming a Docker Host accessible at 192.168.188.102
version: '2'
services:
zookeeper:
image: "confluentinc/cp-zookeeper:4.1.0"
hostname: zookeeper
ports:
- "2181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
@dweldon
dweldon / install-docker.sh
Last active September 5, 2025 12:49
Install docker CE on Linux Mint 18.3
#!/usr/bin/env bash
# https://docs.docker.com/install/linux/docker-ce/ubuntu/
sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable"
sudo apt-get update
sudo apt-get install docker-ce
# https://docs.docker.com/compose/install/
@MaxHalford
MaxHalford / fit.py
Created May 18, 2017 15:35
Keras fit/predict scikit-learn pipeline
import os
from keras import backend as K
from keras import callbacks
from keras import layers
from keras import models
from keras.wrappers.scikit_learn import KerasClassifier
import pandas as pd
import tensorflow as tf
from sklearn import metrics
@yzhong52
yzhong52 / gist:f81e929e5810271292bd08856e2f4512
Last active June 18, 2022 13:53
Create Spark DataFrame From List[Any]
// Spark 2.1
val spark = SparkSession.builder().master("local").getOrCreate()
// Given a list of mixture of strings in integers
val values = List("20030100013280", 1.0)
// Create `Row` from `Seq`
val row = Row.fromSeq(values)
// Create `RDD` from `Row`
@dusenberrymw
dusenberrymw / spark_tips_and_tricks.md
Last active January 10, 2025 07:36
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
@codekansas
codekansas / keras_gensim_embeddings.py
Last active July 23, 2018 09:17
Using Word2Vec embeddings in Keras models
from __future__ import print_function
import json
import os
import numpy as np
from gensim.models import Word2Vec
from gensim.utils import simple_preprocess
from keras.engine import Input
from keras.layers import Embedding, merge
@subfuzion
subfuzion / curl.md
Last active October 11, 2025 00:58
curl POST examples

Common Options

-#, --progress-bar Make curl display a simple progress bar instead of the more informational standard meter.

-b, --cookie <name=data> Supply cookie with request. If no =, then specifies the cookie file to use (see -c).

-c, --cookie-jar <file name> File to save response cookies to.

@psayre23
psayre23 / gist:c30a821239f4818b0709
Last active June 21, 2025 17:52
Runtime Complexity of Java Collections
Below are the Big O performance of common functions of different Java Collections.
List | Add | Remove | Get | Contains | Next | Data Structure
---------------------|------|--------|------|----------|------|---------------
ArrayList | O(1) | O(n) | O(1) | O(n) | O(1) | Array
LinkedList | O(1) | O(1) | O(n) | O(n) | O(1) | Linked List
CopyOnWriteArrayList | O(n) | O(n) | O(1) | O(n) | O(1) | Array