Skip to content

Instantly share code, notes, and snippets.


Anagha Khanolkar airawat

  • Microsoft
View GitHub Profile
set -x
# create the input file based on size (you can get size pattern by running fdisk -l as root)
# Be sure to exclude the Root disk if it is part of your config. You must edit this file to do so
fdisk -l|grep $size|awk '{print $2}'|sed -e"s/\:$//g" > foo
View Oozie-SSHConfig-Azure
B5b. Configure Oozie SSH action
Sometimes, you may need to execute jobs on a specific node - instead of any cluster node.
For this you need oozie service user to be able to connect to the node of choice as your workflow user.
# The following documentation details configuring an application ID to execute a SSH action
# In the illustration-
# edge node=cdh-en01
# oozie server=cdh-mn01
# applicaiton ID=akhanolk
View CompactParsedLogs
package com.khanolkar.bda.util
* @author Anagha Khanolkar
import org.apache.spark.sql.SparkSession
import org.apache.hadoop.fs.{ FileSystem, Path }
import org.apache.hadoop.conf.Configuration
import org.apache.spark.sql._
import com.databricks.spark.avro._
View CompactRawLogs
spark-submit --class com.khanolkar.bda.util.CompactRawLogs \
MyJar-1.0.jar \
"/user/akhanolk/data/raw/streaming/to-be-compacted/" \
"/user/akhanolk/data/raw/compacted/" \
"2" "128" "oozie-124"
View DotNet-StreamingAnalyticsEventPublisher
using System;
using System.Text;
using Microsoft.ServiceBus.Messaging;
using System.Net;
using System.IO;
namespace StreamingAnalyticsEventPublisher
class MeetupRSVPEventSender
View Security-GlossaryOfTerms
Kerberos is a network authentication protocol. It is designed to provide strong authentication for client/server applications by using secret-key cryptography
Kerberos Principals
A user in Kerberos is called a principal, which is made up of three distinct components: the primary, instance, and realm.
A Kerberos principal is used in a Kerberos-secured system to represent a unique identity.
The first component of the principal is called the primary, or sometimes the user component.
The primary component is an arbitrary string and may be the operating system username of the user or the name of a service.
The primary component is followed by an optional section called the instance, which is used to create principals that are used by users in special roles or to define the host on which a service runs, for example.
An instance, if it exists, is separated from the primary by a slash and then the content is used to disambiguate multiple principals for a single user or service.
airawat / 00-OozieConfigSSHAction
Last active Jan 8, 2020
Oozie configuration for SSH action
View 00-OozieConfigSSHAction
# The following documentation details configuring an application ID to execute a SSH action
# In the illustration-
# edge node=cdh-sn03
# oozie server=cdh-mn01
# applicaiton ID=akhanolk
# ==========================================
# 1. On edge node, as application ID
airawat / cascading.accumulo.examples
Last active Jan 2, 2016
cascading.accumuloSample programs
View cascading.accumulo.examples
The sample programs, for Cascading(2.5.1) for Accumulo(1.5.0) are in github -
The source code for the extensions are at-
airawat / 00-LogParserCascading
Last active Jan 1, 2016
View 00-LogParserCascading
About this gist:
This gist is a part of a series of log parsers in Java Mapreduce, Pig, Hive, Python...
This one covers a log parser in Cascading.
It reads syslogs in HDFS -
a) Parses them based on a regex pattern & writes parsed files to HDFS
b) Writes records that dont match pattern to HDFS
c) Writes a report to HDFS that contains the count of distinct processes logged.
Other gists/blogs:
airawat / 00-RegexFilterInAccumuloC#ProxyClient
Last active Dec 30, 2015
Using regex filter in Accumulo Proxy C# client
View 00-RegexFilterInAccumuloC#ProxyClient
List<String> artifactList = new List<String> ();
var scanOpts = new ScanOptions();
String rowRegex = rowID + ".*";
IteratorSetting iterSttng = new IteratorSetting();
iterSttng.Priority = 15;
iterSttng.Name = "rowIDRegexFilter";
You can’t perform that action at this time.