Skip to content

Instantly share code, notes, and snippets.

View jazzwang's full-sized avatar

Jazz Yao-Tsung Wang jazzwang

View GitHub Profile
@jazzwang
jazzwang / json-to-ndjson.md
Created February 16, 2023 12:59 — forked from bzerangue/json-to-ndjson.md
JSON to NDJSON

NDJSON is a convenient format for storing or streaming structured data that may be processed one record at a time.

  • Each line is a valid JSON value
  • Line separator is ‘\n’

1. Convert JSON to NDJSON?

cat test.json | jq -c '.[]' > testNDJSON.json
@jazzwang
jazzwang / jq-profilejsonschema.md
Created October 4, 2022 00:37 — forked from mikehwang/jq-profilejsonschema.md
Use jq to profile the schema of a given JSON object or an array of JSONs objects

Profile JSON schema

Using jq is great for examining JSON objects. You can extend its functionality with custom methods. The following is useful to understand at a high level the structure of arbitrary JSONs which is useful when trying to understand new data sources.

Requires jq verison 1.5.

Profile an object

Add the following method to your ~/.jq:

Generating Flame Graphs for Apache Spark

Flame graphs are a nifty debugging tool to determine where CPU time is being spent. Using the Java Flight recorder, you can do this for Java processes without adding significant runtime overhead.

When are flame graphs useful?

Shivaram Venkataraman and I have found these flame recordings to be useful for diagnosing coarse-grained performance problems. We started using them at the suggestion of Josh Rosen, who quickly made one for the Spark scheduler when we were talking to him about why the scheduler caps out at a throughput of a few thousand tasks per second. Josh generated a graph similar to the one below, which illustrates that a significant amount of time is spent in serialization (if you click in the top right hand corner and search for "serialize", you can see that 78.6% of the sampled CPU time was spent in serialization). We used this insight to spee

System process daemons that are system-wide provided by mac os x are described by launchd preference files that can be showed with the command:
$ sudo ls -all /System/Library/LaunchDaemons/
Third party process daemons that are system-wide provided by the administrator are described by preference files that can be showed with the command:
$ sudo ls -all /Library/LaunchDaemons/
Launch Agents that are per-user provided by mac os x usually loaded when the user logs in. Those provided by the system can be found with:
$ sudo ls -all /System/Library/LaunchAgents/
Launch Agents that are per-user provided by the administrator and usually loaded when the user logs in. Those provided by the system can be found with:
@jazzwang
jazzwang / PomToSbt.scala
Created November 12, 2020 04:34 — forked from mslinn/PomToSbt.scala
Convert pom.xml to build.sbt
import scala.xml._
// To convert a Maven pom.xml to build.sbt:
// 1) Place this code into a file called PomToSbt.scala next to pom.xml
// 2) Type scala PomtoSbt.scala > build.sbt
// The dependencies from pom.xml will be extracted and place into a complete build.sbt file
// Because most pom.xml files only refernence non-Scala dependencies, I did not use %%
val lines = (XML.load("pom.xml") \\ "dependencies") \ "dependency" map { dependency =>
val groupId = (dependency \ "groupId").text
val artifactId = (dependency \ "artifactId").text
This file has been truncated, but you can view the full file.
This file has been truncated, but you can view the full file.
@jazzwang
jazzwang / latency.txt
Created January 12, 2017 05:17 — forked from jboner/latency.txt
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
@jazzwang
jazzwang / .bash_profile
Created April 25, 2016 10:40
azure-cli bash completion on Mac OS X
# ~/.profile: executed by the command interpreter for login shells.
# This file is not read by bash(1), if ~/.bash_profile or ~/.bash_login
# exists.
# see /usr/share/doc/bash/examples/startup-files for examples.
# the files are located in the bash-doc package.
# the default umask is set in /etc/profile; for setting the umask
# for ssh logins, install and configure the libpam-umask package.
#umask 022
:wrapper
:init
BUILD SUCCESSFUL
Total time: 2.419 secs
This build could be faster, please consider using the Gradle Daemon: https://docs.gradle.org/2.12/userguide/gradle_daemon.html