Skip to content

Instantly share code, notes, and snippets.

View chef-bach-test-cluster-create.md

Chef-bach can be used to create a hadoop test cluster using virtual machines on an hypervisor host with enough resources. The resulting cluster will be a 4 node cluster with one of the nodes acting as the bootstrap node which will host a chef server.The other three nodes will be hadoop nodes. 2 out of 3 nodes will be master nodes and one node will be the worker node. The following are the steps to go about creating the test cluster. This has been tested on hypervisor hosts running Mac OS and Ubuntu.

  • Install curl on the hypervisor host
  • Install virtualbox on the hypervisor host
  • Install vagrant on the hypervisor host
  • Delete the default DHCP server inbuilt in virtualbox
  • Run sudo pkill -f VBox on the hypervisor host
  • Clone chef-bach repository onto the hypervisor host git clone https://github.com/bloomberg/chef-bach.git
  • rename chef-bach to chef-bcpc directory on the hypervisor host
View HBase Region Locality
require 'set'
include Java
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.HColumnDescriptor
import org.apache.hadoop.hbase.HConstants
import org.apache.hadoop.hbase.HTableDescriptor
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.TableName
import org.apache.hadoop.io.Text
@cbaenziger
cbaenziger / hbase_backup.md
Last active Aug 1, 2016 — forked from mlongob/hbase_backup.md
Hbase backup solutions
View hbase_backup.md

Introduction

This is a proposed procedure for Hbase table backups in a secure Hbase cluster. Requirements:

  • Live backups (cannot disable table or take hbase offline)
  • Self-Service (non-HBase user can backup/restore their own data)
  • Automatable procedure (Oozie controlled)
  • On secure cluster (cluster with world non-readable /hbase folder)
  • Supports off cluster backups ** Backup location might not have an installed instance of Hbase, just HDFS ** Backup location does not have credentials for hbase user
View gist:fd7f953feac1cf679ced6bdb82232022
ubuntu@dob2-bach-r4an07:~/chef-bach$ git show 8ce3830c8624a179662d71d1abf74ed320fddd9f
commit 8ce3830c8624a179662d71d1abf74ed320fddd9f
Author: Clay Baenziger <cbaenziger@bloomberg.net>
Date: Sun Jul 16 18:44:01 2017 -0400
Check if bach_repo Gemfile.lock is committed
diff --git a/cookbooks/bach_repository/recipes/gems.rb b/cookbooks/bach_reposito
index e1ca4b6..7ebbcc2 100644
--- a/cookbooks/bach_repository/recipes/gems.rb
View HDFS DU DataFlow.graphml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:java="http://www.yworks.com/xml/yfiles-common/1.0/java" xmlns:sys="http://www.yworks.com/xml/yfiles-common/markup/primitives/2.0" xmlns:x="http://www.yworks.com/xml/yfiles-common/markup/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:y="http://www.yworks.com/xml/graphml" xmlns:yed="http://www.yworks.com/xml/yed/3" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://www.yworks.com/xml/schema/graphml/1.1/ygraphml.xsd">
<!--Created by yEd 3.17.2-->
<key attr.name="Description" attr.type="string" for="graph" id="d0"/>
<key for="port" id="d1" yfiles.type="portgraphics"/>
<key for="port" id="d2" yfiles.type="portgeometry"/>
<key for="port" id="d3" yfiles.type="portuserdata"/>
<key attr.name="url" attr.type="string" for="node" id="d4"/>
<key attr.name="description" attr.type="string" for="node" id="d5"/>
<key for="node" id="d6" yfiles.type="nodegraphics"/>
@cbaenziger
cbaenziger / hbase_shell_perf_test.rb
Last active Dec 7, 2017
HBase Shell Perf. Test
View hbase_shell_perf_test.rb
require 'benchmark'
require 'jruby/profiler'
include Java
import java.nio.ByteBuffer
import org.apache.hadoop.hbase.CellUtil
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.HColumnDescriptor
import org.apache.hadoop.hbase.HConstants
import org.apache.hadoop.hbase.HTableDescriptor
import org.apache.hadoop.hbase.TableName
@cbaenziger
cbaenziger / delete_data.rb
Last active Aug 30, 2018
JRuby File Deletion Script
View delete_data.rb
include Java
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import java.io.FileNotFoundException.hadoop.fs.Path
import java.util.NoSuchElementException
# hdfs file system handle
fs = FileSystem.newInstance(Configuration.new)
@cbaenziger
cbaenziger / job.properties
Last active May 18, 2019
HDFS Balancer Oozie Workflow
View job.properties
nameNode=hdfs://<cluster>
jobTracker=<cluster>
queueName=defult
workflowRoot=${nameNode}/user/hdfs/hdfs_balancer
oozie.wf.application.path=${workflowRoot}/workflow.xml