Skip to content

Instantly share code, notes, and snippets.

View manuzhang's full-sized avatar
🧒
Working from home

Manu Zhang manuzhang

🧒
Working from home
View GitHub Profile
@manuzhang
manuzhang / spark-monitor.py
Created March 5, 2019 07:03
Monitor CPU and memory usage from Spark master UI
# coding: utf-8
from bs4 import BeautifulSoup
import requests
page = requests.get("http://spark-master.com").content
soup = BeautifulSoup(page, 'html.parser')
cores_text = soup.find('strong', string='Cores in use:').next_sibling
cores_parts = cores_text.strip().split('\n')
package com.manuzhang.leetcode;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
public class BinaryTreeLevelOrderTraversal {
public List<List<Integer>> levelOrder(TreeNode root) {
List<List<Integer>> lo = new LinkedList<List<Integer>>();
if (root == null) {
import scala.collection.mutable
class LRUCache(_capacity: Int) {
var head: Option[Int] = None
var tail: Option[Int] = None
var count = 0
val cache = mutable.Map.empty[Int, Node]
def get(key: Int): Int = {
Rule Nano Time
org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability 489262230
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics 243030776
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PropagateTypes 143141555
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer 97690381
org.apache.spark.sql.catalyst.analysis.ResolveCreateNamedStruct 87845664
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame 85098172
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder 83967566
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions 63928074
@manuzhang
manuzhang / MurmurHashEncode.scala
Last active September 15, 2017 02:35
Encode with Murmur Hash
import java.nio.{ByteBuffer, ByteOrder}
// "com.google.guava" % "guava" % "16.0.1"
import com.google.common.hash.Hashing
object MurmurHash {
private val seed = 0x3c074a61
def encode(prefix: Int, value: Long): Long = {
val pb = ByteBuffer.allocate(4).order(ByteOrder.LITTLE_ENDIAN).putInt(prefix).array()
import json
import subprocess
import sys
# Example:
# python demo.py "bin/ydl-tf" "launch" "examples/between-graph/mnist_feed.py"
def main():
args = sys.argv[1:]
output = subprocess.check_output([args[0], args[1]])
@manuzhang
manuzhang / simple_barrier.py
Last active March 27, 2020 07:12 — forked from yaroslavvb/simple_barrier.py
TensorFlow in-graph replication example
"""
This example is adapted from https://gist.github.com/yaroslavvb/ef407a599f0f549f62d91c3a00dcfb6c
Example of barrier implementation using TensorFlow shared variables.
All workers synchronize on barrier, copy global parameters to local versions
and increment global parameter variable asynchronously. Should see something
like this:
python simple_barrier.py --wk "node13-1:21393,node13-1:21395"
Creating session
@manuzhang
manuzhang / add_window_metrics.patch
Created February 24, 2017 02:16
Patch of adding metrics to WindowOperator
Index: flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperator.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperator.java (date 1487603477000)
+++ flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/windowing/WindowOperator.java (revision )
@@ -38,6 +38,8 @@
import org.apache.flink.core.fs.FSDataInputStream;
import org.apache.flink.core.memory.DataInputView;
We can make this file beautiful and searchable if this error is corrected: It looks like row 10 should actually have 16 columns, instead of 12. in line 9.
time(s),total_slots,used_slots,workers,tasks,executors,transferred (messages),throughput (messages/s),throughput (MB/s),spout_throughput (MB/s),spout_executors,spout_transferred (messages),spout_acked (messages),spout_throughput (messages/s),spout_avg_complete_latency(ms),spout_max_complete_latency(ms)
60,16,4,4,68,68,12581640,209669,20.0,20.000,32,12581640,12568820,209669,26.4,28.1
120,16,4,4,68,68,15556400,258811,25.0,25.000,32,15556400,28122640,258811,25.6,27.4
180,16,4,4,68,68,15141200,252046,25.0,25.000,32,15141200,43243080,252046,25.3,27.3
240,16,4,4,68,68,15685280,261168,26.0,26.000,32,15685280,58923020,261168,25.2,26.8
300,16,4,4,68,68,15406060,256588,25.0,25.000,32,15406060,74317520,256588,25.1,26.6
360,16,4,4,68,68,14788500,246306,24.0,24.000,32,14788500,89099380,246306,25.2,26.6
420,16,4,4,68,68,14489620,241356,24.0,24.000,32,14489620,103584100,241356,25.4,26.6
480,16,4,4,68,68,14862080,247569,24.0,24.000,32,14862080,118429540,247569,25.5,26.8
540,16,4,4,68,68,14920480,248521,24.0,24.000,32,1492048
# metrics configurations
metrics.enabled: true
metrics.poll: 60000 # 60 secs
metrics.time: 900000 # 15 mins
metrics.path: "reports"
# topology configurations
topology.workers: 4
topology.acker.executors: 4
topology.max.spout.pending: 200