Skip to content

Instantly share code, notes, and snippets.

Avatar
💀
This.isDevOps(powah_code=36)

Thuan Duong thuandt

💀
This.isDevOps(powah_code=36)
  • Ho Chi Minh, Vietnam
View GitHub Profile
View quick-tips-optimizing-jvm.md

Quick Tips for Fast Code on the JVM

I was talking to a coworker recently about general techniques that almost always form the core of any effort to write very fast, down-to-the-metal hot path code on the JVM, and they pointed out that there really isn't a particularly good place to go for this information. It occurred to me that, really, I had more or less picked up all of it by word of mouth and experience, and there just aren't any good reference sources on the topic. So… here's my word of mouth.

This is by no means a comprehensive gist. It's also important to understand that the techniques that I outline in here are not 100% absolute either. Performance on the JVM is an incredibly complicated subject, and while there are rules that almost always hold true, the "almost" remains very salient. Also, for many or even most applications, there will be other techniques that I'm not mentioning which will have a greater impact. JMH, Java Flight Recorder, and a good profiler are your very best friend! Mea

@thuandt
thuandt / deployment-tool-ansible-puppet-chef-salt.md Choosing a deployment tool - ansible vs puppet vs chef vs salt
View deployment-tool-ansible-puppet-chef-salt.md

Requirements

  • no upfront installation/agents on remote/slave machines - ssh should be enough
  • application components should use third-party software, e.g. HDFS, Spark's cluster, deployed separately
  • configuration templating
  • environment requires/asserts, i.e. we need a JVM in a given version before doing deployment
  • deployment process run from Jenkins

Solution

View useful_pandas_snippets.py
# List unique values in a DataFrame column
# h/t @makmanalp for the updated syntax!
df['Column Name'].unique()
# Convert Series datatype to numeric (will error if column has non-numeric values)
# h/t @makmanalp
pd.to_numeric(df['Column Name'])
# Convert Series datatype to numeric, changing non-numeric values to NaN
# h/t @makmanalp for the updated syntax!
View types.markdown

Types

A type is a collection of possible values. An integer can have values 0, 1, 2, 3, etc.; a boolean can have values true and false. We can imagine any type we like: for example, a HighFive type that allows the values "hi" or 5, but nothing else. It's not a string and it's not an integer; it's its own, separate type.

Statically typed languages constrain variables' types: the programming language might know, for example, that x is an Integer. In that case, the programmer isn't allowed to say x = true; that would be an invalid program. The compiler will refuse to compile it, so we can't even run it.

View latency.markdown

Latency numbers every programmer should know

L1 cache reference ......................... 0.5 ns
Branch mispredict ............................ 5 ns
L2 cache reference ........................... 7 ns
Mutex lock/unlock ........................... 25 ns
Main memory reference ...................... 100 ns             
Compress 1K bytes with Zippy ............. 3,000 ns  =   3 µs
Send 2K bytes over 1 Gbps network ....... 20,000 ns  =  20 µs
SSD random read ........................ 150,000 ns  = 150 µs
Read 1 MB sequentially from memory ..... 250,000 ns  = 250 µs
@thuandt
thuandt / unique_digit_prime.py
Created Dec 8, 2012 — forked from lewtds/unique_digit_prime.py
Tìm số nguyên tố lớn nhất có 9 chữ số và các chữ số khác nhau từng đôi một
View unique_digit_prime.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Filename: prime.py
#
# Description: Tìm số nguyên tố lớn nhất có 9 chữ số khác nhau
#
# Created: 12/09/2012 06:32:33 AM
# Last Modified: 12/09/2012 06:40:49 AM