Skip to content

Instantly share code, notes, and snippets.

#!/bin/sh
set -x
# create the input file based on size (you can get size pattern by running fdisk -l as root)
# Be sure to exclude the Root disk if it is part of your config. You must edit this file to do so
size=$1
shift;
fdisk -l|grep $size|awk '{print $2}'|sed -e"s/\:$//g" > foo
B5b. Configure Oozie SSH action
Sometimes, you may need to execute jobs on a specific node - instead of any cluster node.
For this you need oozie service user to be able to connect to the node of choice as your workflow user.
# The following documentation details configuring an application ID to execute a SSH action
# In the illustration-
# edge node=cdh-en01
# oozie server=cdh-mn01
# applicaiton ID=akhanolk
package com.khanolkar.bda.util
/**
* @author Anagha Khanolkar
*/
import org.apache.spark.sql.SparkSession
import org.apache.hadoop.fs.{ FileSystem, Path }
import org.apache.hadoop.conf.Configuration
import org.apache.spark.sql._
import com.databricks.spark.avro._
spark-submit --class com.khanolkar.bda.util.CompactRawLogs \
............
MyJar-1.0.jar \
"/user/akhanolk/data/raw/streaming/to-be-compacted/" \
"/user/akhanolk/data/raw/compacted/" \
"2" "128" "oozie-124"
using System;
using System.Text;
using Microsoft.ServiceBus.Messaging;
using System.Net;
using System.IO;
namespace StreamingAnalyticsEventPublisher
{
class MeetupRSVPEventSender
{
Kerberos
Kerberos is a network authentication protocol. It is designed to provide strong authentication for client/server applications by using secret-key cryptography
Kerberos Principals
A user in Kerberos is called a principal, which is made up of three distinct components: the primary, instance, and realm.
A Kerberos principal is used in a Kerberos-secured system to represent a unique identity.
The first component of the principal is called the primary, or sometimes the user component.
The primary component is an arbitrary string and may be the operating system username of the user or the name of a service.
The primary component is followed by an optional section called the instance, which is used to create principals that are used by users in special roles or to define the host on which a service runs, for example.
An instance, if it exists, is separated from the primary by a slash and then the content is used to disambiguate multiple principals for a single user or service.
@airawat
airawat / 00-OozieConfigSSHAction
Last active January 8, 2020 02:41
Oozie configuration for SSH action
# The following documentation details configuring an application ID to execute a SSH action
# In the illustration-
# edge node=cdh-sn03
# oozie server=cdh-mn01
# applicaiton ID=akhanolk
# ==========================================
# 1. On edge node, as application ID
@airawat
airawat / cascading.accumulo.examples
Last active January 2, 2016 08:09
cascading.accumuloSample programs
The sample programs, for Cascading(2.5.1) for Accumulo(1.5.0) are in github -
https://github.com/airawat/cascading.accumulo.examples
The source code for the extensions are at-
https://github.com/airawat/cascading.accumulo
@airawat
airawat / 00-LogParserCascading
Last active January 1, 2016 13:29
LogParserInCascading
About this gist:
================
This gist is a part of a series of log parsers in Java Mapreduce, Pig, Hive, Python...
This one covers a log parser in Cascading.
It reads syslogs in HDFS -
a) Parses them based on a regex pattern & writes parsed files to HDFS
b) Writes records that dont match pattern to HDFS
c) Writes a report to HDFS that contains the count of distinct processes logged.
Other gists/blogs:
@airawat
airawat / 00-RegexFilterInAccumuloC#ProxyClient
Last active December 30, 2015 16:09
Using regex filter in Accumulo Proxy C# client
......
List<String> artifactList = new List<String> ();
var scanOpts = new ScanOptions();
String rowRegex = rowID + ".*";
IteratorSetting iterSttng = new IteratorSetting();
iterSttng.Priority = 15;
iterSttng.Name = "rowIDRegexFilter";
iterSttng.IteratorClass="org.apache.accumulo.core.iterators.user.RegExFilter";