Skip to content

Instantly share code, notes, and snippets.

Chef-bach can be used to create a hadoop test cluster using virtual machines on an hypervisor host with enough resources. The resulting cluster will be a 4 node cluster with one of the nodes acting as the bootstrap node which will host a chef server.The other three nodes will be hadoop nodes. 2 out of 3 nodes will be master nodes and one node will be the worker node. The following are the steps to go about creating the test cluster. This has been tested on hypervisor hosts running Mac OS and Ubuntu.

  • Install curl on the hypervisor host
  • Install virtualbox on the hypervisor host
  • Install vagrant on the hypervisor host
  • Delete the default DHCP server inbuilt in virtualbox
  • Run sudo pkill -f VBox on the hypervisor host
  • Clone chef-bach repository onto the hypervisor host git clone https://github.com/bloomberg/chef-bach.git
  • rename chef-bach to chef-bcpc directory on the hypervisor host
require 'set'
include Java
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.HColumnDescriptor
import org.apache.hadoop.hbase.HConstants
import org.apache.hadoop.hbase.HTableDescriptor
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.TableName
import org.apache.hadoop.io.Text
@cbaenziger
cbaenziger / hbase_backup.md
Last active August 1, 2016 14:37 — forked from mlongob/hbase_backup.md
Hbase backup solutions

Introduction

This is a proposed procedure for Hbase table backups in a secure Hbase cluster. Requirements:

  • Live backups (cannot disable table or take hbase offline)
  • Self-Service (non-HBase user can backup/restore their own data)
  • Automatable procedure (Oozie controlled)
  • On secure cluster (cluster with world non-readable /hbase folder)
  • Supports off cluster backups ** Backup location might not have an installed instance of Hbase, just HDFS ** Backup location does not have credentials for hbase user
ubuntu@dob2-bach-r4an07:~/chef-bach$ git show 8ce3830c8624a179662d71d1abf74ed320fddd9f
commit 8ce3830c8624a179662d71d1abf74ed320fddd9f
Author: Clay Baenziger <cbaenziger@bloomberg.net>
Date: Sun Jul 16 18:44:01 2017 -0400
Check if bach_repo Gemfile.lock is committed
diff --git a/cookbooks/bach_repository/recipes/gems.rb b/cookbooks/bach_reposito
index e1ca4b6..7ebbcc2 100644
--- a/cookbooks/bach_repository/recipes/gems.rb
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:java="http://www.yworks.com/xml/yfiles-common/1.0/java" xmlns:sys="http://www.yworks.com/xml/yfiles-common/markup/primitives/2.0" xmlns:x="http://www.yworks.com/xml/yfiles-common/markup/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:y="http://www.yworks.com/xml/graphml" xmlns:yed="http://www.yworks.com/xml/yed/3" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://www.yworks.com/xml/schema/graphml/1.1/ygraphml.xsd">
<!--Created by yEd 3.17.2-->
<key attr.name="Description" attr.type="string" for="graph" id="d0"/>
<key for="port" id="d1" yfiles.type="portgraphics"/>
<key for="port" id="d2" yfiles.type="portgeometry"/>
<key for="port" id="d3" yfiles.type="portuserdata"/>
<key attr.name="url" attr.type="string" for="node" id="d4"/>
<key attr.name="description" attr.type="string" for="node" id="d5"/>
<key for="node" id="d6" yfiles.type="nodegraphics"/>
@cbaenziger
cbaenziger / hbase_shell_perf_test.rb
Last active December 7, 2017 14:15
HBase Shell Perf. Test
require 'benchmark'
require 'jruby/profiler'
include Java
import java.nio.ByteBuffer
import org.apache.hadoop.hbase.CellUtil
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.HColumnDescriptor
import org.apache.hadoop.hbase.HConstants
import org.apache.hadoop.hbase.HTableDescriptor
import org.apache.hadoop.hbase.TableName
@cbaenziger
cbaenziger / delete_data.rb
Last active August 30, 2018 00:58
JRuby File Deletion Script
include Java
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import java.io.FileNotFoundException.hadoop.fs.Path
import java.util.NoSuchElementException
# hdfs file system handle
fs = FileSystem.newInstance(Configuration.new)
@cbaenziger
cbaenziger / job.properties
Last active May 18, 2019 00:34
HDFS Balancer Oozie Workflow
nameNode=hdfs://<cluster>
jobTracker=<cluster>
queueName=defult
workflowRoot=${nameNode}/user/hdfs/hdfs_balancer
oozie.wf.application.path=${workflowRoot}/workflow.xml