Skip to content

Instantly share code, notes, and snippets.

View stefanthoss's full-sized avatar
🏠
Working from home

Stefan Thoss stefanthoss

🏠
Working from home
View GitHub Profile
@jalkjaer
jalkjaer / run.sh
Last active November 3, 2019 19:55
steps to automatically build k8s spark with latest hadoop 2 version and push to ECR
export HADOOP_VERSION=2.9.1
export SPARK_VERSION=2.3.2
export AWS_ACCOUNT_ID=<your numeric AWS account id>
export ECR_REGION=us-east-1
# Fetch and extract the spark source
curl -L "https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}.tgz" | tar -xzvf -
cd "spark-${SPARK_VERSION}"
# set maven opts according to https://spark.apache.org/docs/latest/building-spark.html
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
import glob
import pandas as pd
# glob.glob('data*.csv') - returns List[str]
# pd.read_csv(f) - returns pd.DataFrame()
# for f in glob.glob() - returns a List[DataFrames]
# pd.concat() - returns one pd.DataFrame()
df = pd.concat([pd.read_csv(f) for f in glob.glob('data*.csv')], ignore_index = True)
@timvisee
timvisee / falsehoods-programming-time-list.md
Last active May 6, 2024 20:05
Falsehoods programmers believe about time, in a single list

Falsehoods programmers believe about time

This is a compiled list of falsehoods programmers tend to believe about working with time.

Don't re-invent a date time library yourself. If you think you understand everything about time, you're probably doing it wrong.

Falsehoods

  • There are always 24 hours in a day.
  • February is always 28 days long.
  • Any 24-hour period will always begin and end in the same day (or week, or month).
@atais
atais / handbrake.sh
Created July 10, 2017 20:21
Batch convert DVD videos (with VIDEO_TS folder) to MKV using HandBrake CLI
#!/bin/bash
###########
#
# Batch convert DVD Videos with HandBrake CLI
# The script will recursively look for "VIDEO_TS" folders and parse them
#
# Read this to understand:
# https://mattgadient.com/2013/06/12/a-best-settings-guide-for-handbrake-0-9-9/
# http://www.thewebernets.com/2015/02/28/easiest-best-optimal-settings-for-handbrake-dvd-video-conversion-on-mac-windows-and-linux/
#
@ryanpersaud
ryanpersaud / HDF 2.1 Error
Last active January 23, 2019 21:25
Error encountered when attempting to launch Storm topology with HDF 2.1
Darrell Kienzle noted the following:
I downloaded HDF 2.1 and tweaked my vagrant rig to load all the new stuff from HDF instead of HDP. There was one less storm dependency
on HDF – no need for the “atlas_metadata” which HDP includes & requires. Anyway, when I tried to run the parser, I got the exception
below.
Long story short, it _looks_ like the submission is a two-step process now. First, it tries to do some fixups in
ClientJarTransformerRunner” and output a tmp jar in /tmp. This is per:
• STORM-1202: Migrate APIs to org.apache.storm, but try to provide some form of backwards compatability
@whophil
whophil / jupyter.service
Last active October 30, 2023 16:33 — forked from doowon/jupyter_systemd
A systemd script for running a Jupyter notebook server.
# After Ubuntu 16.04, Systemd becomes the default.
# It is simpler than https://gist.github.com/Doowon/38910829898a6624ce4ed554f082c4dd
[Unit]
Description=Jupyter Notebook
[Service]
Type=simple
PIDFile=/run/jupyter.pid
ExecStart=/home/phil/Enthought/Canopy_64bit/User/bin/jupyter-notebook --config=/home/phil/.jupyter/jupyter_notebook_config.py
@tasdikrahman
tasdikrahman / python_tests_dir_structure.md
Last active May 5, 2024 06:06
Typical Directory structure for python tests

A Typical directory structure for running tests using unittest

Ref : stackoverflow

The best solution in my opinion is to use the unittest [command line interface][1] which will add the directory to the sys.path so you don't have to (done in the TestLoader class).

For example for a directory structure like this:

new_project

├── antigravity.py

@seebk
seebk / README.md
Last active April 24, 2024 07:17
Extract embedded certificates and keys from OpenVPN config files

This python script is intended to automate the extraction of embedded certificates and keys from OpenVPN config files.

Unfortunately the GNOME Network-Manager is not able to automatically import OpenVPN config files with embedded certificates and keys. A workaround is to manually extract these and store them in separate files (e.g. see https://naveensnayak.wordpress.com/2013/03/04/ubuntu-openvpn-with-ovpn-file/).

Instructions:

  • Make shure all the required packages are installed. For example on Ubuntu and Debian run:

    $ sudo apt-get install python3 network-manager-openvpn-gnome

@masih
masih / fish_shell_local_install.sh
Last active November 7, 2022 10:19
Installs Fish Shell without root access
#!/bin/bash
# Script for installing Fish Shell on systems without root access.
# Fish Shell will be installed in $HOME/local/bin.
# It's assumed that wget and a C/C++ compiler are installed.
# exit on error
set -e
FISH_SHELL_VERSION=2.1.1
@debasishg
debasishg / gist:8172796
Last active May 7, 2024 22:18
A collection of links for streaming algorithms and data structures

General Background and Overview

  1. Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
  2. Models and Issues in Data Stream Systems
  3. Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
  4. Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
  5. [Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&amp;rep=rep1&amp;t