Skip to content

Instantly share code, notes, and snippets.

View eddienko's full-sized avatar
🕊️
Stand with Peace

Eduardo Gonzalez eddienko

🕊️
Stand with Peace
View GitHub Profile
distributed:
version: 2
scheduler:
bandwidth: 1000000000 # 100 MB/s estimated worker-worker bandwidth
worker:
memory:
target: 0.90 # target fraction to stay below
spill: False # fraction at which we spill to disk
pause: 0.80 # fraction at which we pause worker threads
terminate: 0.95 # fraction at which we terminate the worker
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@eddienko
eddienko / Makefile
Created May 2, 2018 07:45 — forked from maartenbreddels/Makefile
Makefile for converting GaiaDR2 cvs files to a single hdf5 file
# Makefile for converting the CSV files from http://cdn.gea.esac.esa.int/Gaia/gdr2/gaia_source/csv/
# to a single (vaex) hdf5 file
# * https://docs.vaex.io
# * https://github.com/maartenbreddels/vaex/
# It is multistage to work around opening 60 000 files at once.
# Strategy is
# * stage1: convert all cvs.gz to csv to hdf5
# * do this via xargs and calling make again, since gmake has trouble matching 60 000 rules
# * stage2: Create part-<NUMBER>.txt files containing max FILES_PER_PART per file
# * stage3: convert the list of hdf5 files to single hdf5 files (part-<NUMBER>.hdf5)
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@eddienko
eddienko / pcap.py
Created March 12, 2018 15:04 — forked from mrocklin/pcap.py
import pandas as pd
def parse(line):
words = line.split()
time = words[0]
protocol = words[1]
if protocol == 'IP':
src_ip, src_port = words[2].rsplit('.', 1)
dst_ip, dst_port = words[4].strip(':').rsplit('.', 1)
@eddienko
eddienko / Spark_Jupyter_OS_X.md
Created January 27, 2018 18:15 — forked from frank-leap/Spark_Jupyter_OS_X.md
Steps to configure Jupyter (iPython Notebook) with Python (3.5.1) and Spark (1.6.0) kernel on Mac OS X (El Capitan)

Install Python3, Scala and Apache Spark via Brew (http://brew.sh/)

brew update
brew install python3
brew install scala
brew install apache-spark

Set environment variables

@eddienko
eddienko / parsel.sql
Created November 6, 2015 15:19
Parsel: A Simple Function for Parallel Query in Postgres using Dblink
-- DROP FUNCTION IF EXISTS public.parsel(db text, table_to_chunk text, pkey text, query text, output_table text, table_to_chunk_alias text, num_chunks integer);
CREATE OR REPLACE FUNCTION public.parsel(db text, table_to_chunk text, pkey text, query text, output_table text, table_to_chunk_alias text default '', num_chunks integer default 2)
RETURNS text AS
$BODY$
DECLARE
sql TEXT;
min_id integer;
max_id integer;
step_size integer;
lbnd integer;