Skip to content

Instantly share code, notes, and snippets.

A very common request from many customers is to be able to index text in image files; for example, text in scanned PNG files. In this tutorial we are going to walkthrough how to do this with SOLR.

###Prerequisite

Step-by-step guide

  • **Install dependencies - **this will provide you support for processing pngs, jpegs, and tiffs
@saptak
saptak / nifi.md
Last active September 20, 2015 20:42

###Introduction to Apache NiFi

A very common scenario in many large organizations is to define, operationalize and manage complex dataflow between myriad distributed systems that often speak different protocols and understand different data formats. Messaging-based solutions are a popular answer these days, but they don’t address many of the fundamental challenges of enterprise dataflow.

###Data Workflow scenario Let's dive deeper into the dataflow requirement. On one end we have systems that acquire data, whether they are sensors, business, or organizations gathering data for your business. That information that is collected needs to be sent to processing systems, analytics systems like Hadoop, Storm, Spark, etc and then ulimately needs to be persisted into a backing store where business users can apply analytics on the data at rest to derive business value.

Let's consisder the scenario of IoT or Remote Sensor Delivery. As the data gets collected by remote sensors on factory floors, oil rigs or travelling

@saptak
saptak / jreport.md
Last active September 2, 2015 22:22

Using JReport to visualize data with the Hortonworks Data Platform

###Introduction

JReport is a embedded BI reporting tool can easily extract and visualize data from the Hortonworks Data Platform 2.3 using the Apache Hive JDBC driver. You can then create reports, dashboards, and data analysis, which can be embedded into your own applications.

In this tutorial we are going to walkthrough the folllowing steps to demonstrate Apache Hive with JReport:

  1. Install the Apache Hive JDBC driver with JReport.
  2. Create a new JReport Catalog to manage the Hive connection.

Introduction

Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides central security policy administration across the core enterprise security requirements of authorization, accounting and data protection.

Apache Ranger already extends baseline features for coordinated enforcement across Hadoop workloads from batch, interactive SQL and real–time in Hadoop.

In this tutorial, we cover using Apache Ranger for HDP 2.3 to secure your Hadoop environment. We will walkthrough the following topics:

  1. Support for Knox authorization and audit
  2. Command line policies in Hive
@saptak
saptak / 2015-09-25-processing-real-time-event-stream-with-apache-storm.md
Last active December 1, 2019 09:53
Processing realtime event stream with Apache Storm

Processing realtime event stream with Apache Storm

Introduction

In this tutorial, we will explore Apache Storm and use it with Apache Kafka to develop a multi-stage event processing pipeline.

image01

In an event processing pipeline, each stage is a purpose-built step that performs some real-time processing against upstream event streams for downstream analysis. This produces increasingly richer event streams, as data flows through the pipeline:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.lang.reflect.Array;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.*;
public class MP1 {
Random generator;
@saptak
saptak / gist:5f1de4173f9d6e85e122
Created April 22, 2015 18:06
find location of the port number in a URL
var str1="http://127.0.0.1:5000";
var str2="http://blah.blah.com";
var i1=str1.search(/:\d/);
console.log(i1);
i1=str2.search(/:\d/);
console.log(i1);
@saptak
saptak / 0_reuse_code.js
Last active August 29, 2015 14:15
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console