Skip to content

Instantly share code, notes, and snippets.

View vshankar's full-sized avatar
🎯
Focusing

Venky Shankar vshankar

🎯
Focusing
View GitHub Profile
@vshankar
vshankar / compliance-infrastructure.md
Last active August 29, 2015 14:10
Data Compliance Infrastructure

Data Compliance Infrastructure


Introduction

This document explains the infrastructural changes required in GlusterFS to support various data compliance feature. Data management is a generic term that includes filesystem data handling and management activities such as locality aware data placement, data tiering, BitRot detection and the likes. Operational mechanism of these features are more or less similar w.r.t. the input operation set being worked on. Additionally and more importantly order of operations (or traces) tend to be much more relaxed in nature unlike replication which relies of strict ordering of operation for correctness.

This document is split into two parts. The first part elaborates on the infrastructure design required for the correct functioning of various data classification mechanisms. Requirements for each sub-feature is presented briefly and correctness is proven as part of the design. Thereafter, the nature of changes for each component is listed and links to ap

Introduction

This document goes through the new design of distributed geo-replication, it's features and the nature of changes involved. First we list down some of the important features.

  • Distributed asynchronous replication
  • Fast and versatile change detection
  • Replica failover
  • Hardlink synchronization
  • Effective handling of deletes and renames

libgfchangelog: "GlusterFS changelog" consumer library

This document puts forward the intended need for GlusterFS changelog consumer library (a.k.a. libgfchangelog) for consuming changlogs produced by the Changelog translator. Further, it mentions the proposed design and the API exposed by it. A brief explanation of changelog translator can also be found as a commit message in the upstream source tree (commit : 11f6c56f83b977a08f9d74563249cef59e22a05d)

Initial consumer of changelogs would be Geo-Replication (release 3.5). Possible consumers in the future could be backup utilities, GlusterFS self-heal, bit-rot detection, AV scanners. All these utilities have one thing in common - to get a list of changed entities (created/modified/deleted) in the file system. Therefore, the need arises to provide such functionality in the form of a shared library that applications can link against and query for changes (See API section). There is no plan as of now to pr

@vshankar
vshankar / quickparsefuse.go
Created October 3, 2012 06:42 — forked from csabahenk/quickparsefuse.go
Quick printer (for glusterfs and github.com/csabahenk/strace-fusedump produced) FUSE dumps
package main
import (
"fmt"
"io"
"log"
"os"
"unsafe"
)
@vshankar
vshankar / cdc.md
Created April 3, 2012 19:00
Compression/DeCompression Translator

Compression/De-Compression Translator

Usefulness

This translator minimizes the data that is transferred over the wire by compressing (deflate) it before it's written to the network. This compressed data is decompressed (inflate) on the client side. Hence, this translator is needed to be loaded on client and well as server with inverse operation modes.

Compression and Decompression would be referred as deflate and inflate (respectively) further ahead in this document.

@vshankar
vshankar / geo-rep-recovery.md
Created March 28, 2012 06:29
Geo-rep failover-failback plan

Geo-rep Recovery plan

Use Case

In the event of geo-rep master suffering a partial failure (one of the GlusterFS brick process not functioning) or a full failure (master node shuts down), the steps involved in recovering the master from the slave is what is covered here.

Notion used in this document

@vshankar
vshankar / patch.diff
Created August 30, 2011 10:17
Hadoop Mountbroker Patch
diff --git a/libglusterfs/src/common-utils.h b/libglusterfs/src/common-utils.h
index 51d9d88..c7d784c 100644
--- a/libglusterfs/src/common-utils.h
+++ b/libglusterfs/src/common-utils.h
@@ -69,6 +69,7 @@ void trap (void);
#define GF_UNIT_PB_STRING "PB"
#define GEOREP "geo-replication"
+#define GHADOOP "glusterfs-hadoop"
@vshankar
vshankar / HMBI.md
Created August 30, 2011 08:44
Hadoop Mountbroker Integration

Current working of Hadoop with GlusterFS

Currently to use Hadoop with GlusterFS, the Hadoop Map/Reduce daemons viz. TaskTracker and JobTracker needs to run as super-user. This is needed to be able to mount/unmount GlusterFS volume and access/modify data in it. On the contrary using Hadoop with HDFS had no such limitation. The daemons can run as any user and have full permission of the FS.

Mountbroker

The solution to the above case is solved by using Mounbroker. A detailed explanation of it's working in mentioned here https://gist.github.com/71ff8faa041425662185