Skip to content

Instantly share code, notes, and snippets.

View nsphung's full-sized avatar

Nicolas PHUNG nsphung

View GitHub Profile
@n1snt
n1snt / Oh my ZSH with zsh-autosuggestions zsh-syntax-highlighting zsh-fast-syntax-highlighting and zsh-autocomplete.md
Last active June 11, 2024 02:57
Oh my ZSH with zsh-autosuggestions zsh-syntax-highlighting zsh-fast-syntax-highlighting and zsh-autocomplete.md

Oh my zsh.

Oh My Zsh

Install ZSH.

sudo apt install zsh-autosuggestions zsh-syntax-highlighting zsh

Install Oh my ZSH.

@fworks
fworks / install-zsh-windows-git-bash.md
Last active May 21, 2024 05:32
Zsh / Oh-my-zsh on Windows Git Bash
@gvenzl
gvenzl / One Liner to download the latest release from your GitHub repo.md
Last active January 14, 2024 22:18
One Liner to download the latest release from your GitHub repo
LOCATION=$(curl -s https://api.github.com/repos/<YOUR ORGANIZTION>/<YOUR REPO>/releases/latest \
| grep "zipball_url" \
| awk '{ print $2 }' \
| sed 's/,$//'       \
| sed 's/"//g' )     \
; curl -L -o <OUTPUT FILE NAME> $LOCATION

for example:

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Generating Flame Graphs for Apache Spark

Flame graphs are a nifty debugging tool to determine where CPU time is being spent. Using the Java Flight recorder, you can do this for Java processes without adding significant runtime overhead.

When are flame graphs useful?

Shivaram Venkataraman and I have found these flame recordings to be useful for diagnosing coarse-grained performance problems. We started using them at the suggestion of Josh Rosen, who quickly made one for the Spark scheduler when we were talking to him about why the scheduler caps out at a throughput of a few thousand tasks per second. Josh generated a graph similar to the one below, which illustrates that a significant amount of time is spent in serialization (if you click in the top right hand corner and search for "serialize", you can see that 78.6% of the sampled CPU time was spent in serialization). We used this insight to spee

@fdv
fdv / es-cheat-sheet.md
Last active February 2, 2021 14:44
An ElasticSearch management cheat sheet

These are the snippets I use most of the time when administrating my ES cluster

To be updated

Settings to change before you do something

Before restarting a data node

curl -XPUT 'http://escluster:9200/_cluster/settings' -d '{
import org.apache.spark.ml.feature.{CountVectorizer, RegexTokenizer, StopWordsRemover}
import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer}
import org.apache.spark.mllib.linalg.Vector
import sqlContext.implicits._
val numTopics: Int = 100
val maxIterations: Int = 100
val vocabSize: Int = 10000
@gccpacman
gccpacman / docker-https.md
Last active May 6, 2022 05:50
Protecting the Docker daemon Socket with HTTPS

#Protecting the Docker daemon Socket with HTTPS

HOST-IP:172.17.42.1 VM-IP:172.17.0.2

openssl genrsa -aes256 -out ca-key.pem 2048
openssl req -new -x509 -days 365 -key ca-key.pem -sha256 -out ca.pem ```Common Name:  172.17.42.1```
openssl genrsa -out server-key.pem 2048
openssl req -subj "/CN=172.17.42.1" -new -key server-key.pem -out server.csr

echo subjectAltName = IP:172.17.42.1, IP:172.17.0.2, IP:127.0.0.1 > extfile.cnf

/**
The Play (2.3) json combinator library is arguably the best in the scala world. However it doesnt
work with case classes with greater than 22 fields.
The following gist leverages the shapeless 'Automatic Typeclass Derivation' facility to work around this
limitation. Simply stick it in a common location in your code base, and use like so:
Note: ** Requires Play 2.3 and shapeless 2.1.0
import SWrites._
import SReads._
@solarce
solarce / kafka_prod1.yaml
Last active December 13, 2017 04:58
An example YAML file to use with https://github.com/jmxtrans/jmxtrans/wiki/YAMLConfig for getting Kafka metrics and putting them into graphite. You must use v246 of jmxtrans though. You can grab the .jar I built from https://github.com/solarce/jmxtrans/releases/tag/v246. See https://github.com/solarce/chef-jmxtrans/releases/tag/v1.0.5 and https:…
# kafka_prod1.yaml
#
# The production kafka nodes for prod1
graphite_host: <%= node[:jmxtrans][:graphite][:host] %>
graphite_port: <%= node[:jmxtrans][:graphite][:port] %>
# Define sthe port that the hosts in this config listen for JMX on
# ** THIS PORT HAS TO BE THE SAME FOR ALL HOSTS **
query_port: 9999