Skip to content

Instantly share code, notes, and snippets.

View shrijeet's full-sized avatar

Shrijeet shrijeet

  • Redwood City, CA
View GitHub Profile
@shrijeet
shrijeet / TabDelimToProtoMessage.java
Created April 11, 2012 00:45
Protobuf message from a tab delimited record
package com.example;
import java.io.*;
import java.util.List;
import java.util.regex.Pattern;
import com.google.protobuf.Descriptors.FieldDescriptor;
import com.google.protobuf.Descriptors.FieldDescriptor.JavaType;
import com.example.generated.LogFileProtos.LogFile;
@shrijeet
shrijeet / slow_scan
Created April 24, 2012 23:49
Operation too slow
{
"processingtimems": 25255,
"client": "172.22.4.5:56750",
"timeRange ": [
0,
9223372036854775807
],
"starttimems": 1335310563560,
"responsesize": 0,
"class": "HRegionServer",
@shrijeet
shrijeet / rs_jstack_slow_scan.java
Created April 25, 2012 00:06
Slow scan RS jstack
2012-04-24 20:00:02
Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.2-b01 mixed mode):
"Attach Listener" daemon prio=10 tid=0x00002aaaf680f800 nid=0x2f2e waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"IPC Client (47) connection to inw-5.rfiserve.net/172.22.4.5:60000 from hbase" daemon prio=10 tid=0x00002aaaf847e000 nid=0x2ee3 in Object.wait() [0x000000005960b000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaaae8e5018> (a org.apache.hadoop.hbase.ipc.HBaseClient$Connection)
@shrijeet
shrijeet / slow_scan_saga
Created April 25, 2012 00:18
Slow scan saga
St^Ack: Sorry I was away. It is just spinning
shrijeet: I had wr itten a nice sequence of events , but then pastie.org decided to go down
shrijeet: anyways, here is the sequence 1) Launch a scanner which is slow, example scan hits once in billion 2) see entry in rs active calls list of type 'next'
shrijeet: 3) as soon that next call hits 60 second mark one more entry in rpc call list appears, this new rpc call is 'close'
shrijeet: 4) client does not see the exception untill close hits 60 second mark
shrijeet: 5) client dies once it sees exception (after 2 minutes) but next and close continue remotely
shrijeet: 6) they hang out in active call list of >30 minutes then they go away, region server spits our some warning in between
shrijeet: The warnings being org.apache.hadoop.hbase.UnknownScannerException, org.apache.hadoop.ipc.HBaseServer (output error) etc.
shrijeet: Here is a JSON snapshot of the state I described in (6) http://pastie.org/3842252
@shrijeet
shrijeet / gist:2485198
Created April 25, 2012 01:12
Debug messages on client side scan execution
12/04/24 21:08:07 INFO client.ScannerCallable: sending close on rs for : {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"userprofile,,00000000000000","stopRow":"","totalColumns":1,"cacheBlocks":true,"families":{"info":["ALL"]},"maxVersions":1,"caching":-1}
Openning to file /user/shrijeet/afile to write results
Starting scanning table userprofile ...
12/04/24 21:09:12 INFO client.ScannerCallable: sending close on rs for : {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"userprofile,,00000000000000","stopRow":"","totalColumns":1,"cacheBlocks":true,"families":{"info":["ALL"]},"maxVersions":1,"caching":-1}
12/04/24 21:09:12 INFO client.HTable$ClientScanner: calling close on scanner for callable : {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"","stopRow":"","totalColumns":0,"cacheBlocks":true,"families":{},"maxVersions":1,"caching":50}
12/04/24 21:09:12 INFO client.ScannerCallable: sending close on rs for : {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"","stopRow"
@shrijeet
shrijeet / gist:2514272
Created April 27, 2012 23:30
Thrift + Kyro (courtesy nathan marz)
public class ThriftSerialization extends com.esotericsoftware.kryo.Serializer {
Map<Class, TBase> prototypes;
TSerializer ser;
TDeserializer des;
public ThriftSerialization() {
prototypes = new HashMap<Class, TBase>();
ser = new TSerializer(new TCompactProtocol.Factory());
des = new TDeserializer(new TCompactProtocol.Factory());
@shrijeet
shrijeet / PBSerialize.java
Created April 29, 2012 01:00
A Protobuffer implementation of com.esotericsoftware.kryo.Serializer
public abstract class PBSerialize<T extends Message> extends
Serializer {
protected abstract T parseForm(CodedInputStream in);
private final ThreadLocal<byte[]> thread_local_buffer = new ThreadLocal<byte[]>() {
@Override
protected byte[] initialValue() {
return new byte[1024 * 10]; // 10 KB
}
@shrijeet
shrijeet / rack_finder.py
Created June 7, 2012 00:48
Hadoop rack awareness helper script
#!/usr/bin/env python
import sys
import os
"""
Modify this config section based on needs
1) hostrack_data : a file containing lines, a line represents one host
2) field_sep: separator used for fields in one line
3) log: disable/enable logging
From 2fd88425c5848059fbefc7f85ce14858bbbe7775 Mon Sep 17 00:00:00 2001
From: Shrijeet Paliwal <shrijeet@rocketfuel.com>
Date: Mon, 18 Jun 2012 14:25:21 -0700
Subject: [PATCH] Support client RPC operation level timeout
---
Makefile | 1 +
src/GetRequest.java | 24 ++++++++++++++++++++++--
src/HBaseClient.java | 16 ++++++++++++++++
src/HBaseRpc.java | 24 ++++++++++++++++++++++++
@shrijeet
shrijeet / gist:3093276
Created July 11, 2012 20:46 — forked from anonymous/gist:3093240
JobTracker fails to start: "java.lang.IllegalArgumentException: Does not contain a valid host:port authority: local"
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>MASTER:8021</value>