Skip to content

Instantly share code, notes, and snippets.

View manuzhang's full-sized avatar
🧒
Working from home

Manu Zhang manuzhang

🧒
Working from home
View GitHub Profile
@manuzhang
manuzhang / FilterObject.java
Last active December 26, 2015 08:09
Filter out object with a value 'a' who appears more than once among all the objects
package me.ifthiskills.map;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set;
@manuzhang
manuzhang / cal_diff.sh
Last active December 30, 2015 11:09
calculate insertions and deletions from diff files
#!/bin/sh -
# calculate insertions
cat $1 | grep "^[+][^+].*" | egrep -v "(^[+][[:space:]]*[/]|^[+][[:space]]*[*])" | wc -l
# calculate deletions
cat $1 | grep "^[-][^-].*" | egrep -v "(^[-][[:space:]]*[/]|^[-][[:space]]*[*])" | wc -l
@manuzhang
manuzhang / extract_log.sh
Created December 12, 2013 03:45
extract data from log using awk and tr
#!/bin/sh -
JOB_DIR=$1
NUM=$2
UNCOMP=0
for TASK in `ls $JOB_DIR | grep 'attempt.*_m_.*'`
do
ucp=`awk '/.*[\[]MapOutputCollector::mid_spill[\]].*/ { print $28 }' $JOB_DIR/$TASK/stderr | tr -cd [:digit:]`
let "UNCOMP += ucp"
@manuzhang
manuzhang / hadoop_exception.md
Last active August 29, 2015 14:04
Exception when creating Hadoop FileSystem due to incorrect native hadoop library

The following exception could be thrown when creating Hadoop FileSystem caused by incorrect hadoop native library (configured by LD_LIBRARY_PATH)

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
    at org.apache.hadoop.security.Groups.<init>(Groups.java:55)
    at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:182)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:235)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:214)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:669)

at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571)

@manuzhang
manuzhang / data_cleanse.py
Last active August 29, 2015 14:05
cleanse stock tweets
ticker_dict = {}
company_list = []
exception_list = []
print "read ticker symbol"
with open("./ticker_symbol.tsv") as ticker_symbol:
for line in ticker_symbol:
words = line.split('\t')
ticker_dict[words[0].strip('"').lower()] = words[1]
Producer
Setup
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3
Single thread, no replication
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196
@manuzhang
manuzhang / kill_jps_by_name.sh
Created January 29, 2015 00:47
kill all java processes with the given name
jps | grep ${process_name} | cut -d ' ' -f 1 | xargs kill -9
@manuzhang
manuzhang / kafka_source_benchmark_commands
Created February 10, 2015 04:46
kafka_source_benchmark_commands
Setup
bin/kafka-topics.sh --create --topic consumer --zookeeper 192.168.1.73:2181/kafka --replication-factor 3 --partitions 6
Produce data
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance consumer 500000000 100 -1 acks=1 bootstrap.servers=192.168.1.73:9092 buffer.memory=67108864 batch.size=8196
Single KafkaSource
bin/gear app -jar examples/gearpump-examples-assembly-0.2.4-SNAPSHOT.jar org.apache.gearpump.streaming.examples.kafka.consumer.KafkaConsumerPerf -master 192.168.1.71:3000 -kafka_stream_producer 1 -runseconds 360
KafkaSource + StreamProcessor
@manuzhang
manuzhang / sbt_china_mirror
Last active March 21, 2020 06:46
sbt China mirror
[repositories]
local
oschina: http://maven.oschina.net/content/groups/public/
oschina-ivy: http://maven.oschina.net/content/groups/public/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
typesafe: http://repo.typesafe.com/typesafe/ivy-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext], bootOnly
#sonatype-oss-releases
#maven-central
#sonatype-oss-snapshots
# metrics configurations
metrics.enabled: false
metrics.poll: 60000 # 60 secs
metrics.time: 900000 # 15 mins
metrics.path: "reports"
# topology configurations
topology.workers: 4
topology.acker.executors: 0
topology.max.spout.pending: 200