Saptak Sen saptak

## 0_reuse_code.js
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console

## gist:5f1de4173f9d6e85e122
var str1="http://127.0.0.1:5000";
var str2="http://blah.blah.com";
var i1=str1.search(/:\d/);
console.log(i1);
i1=str2.search(/:\d/);
console.log(i1);

## MP1.java
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.lang.reflect.Array;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.*;

public class MP1 {
    Random generator;

## jreport.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                saptak
                / jreport.md
            
            
              Last active
              September 2, 2015 22:22
            
          
    Using JReport to visualize data with the Hortonworks Data Platform

###Introduction
JReport is a embedded BI reporting tool can easily extract and visualize data from the Hortonworks Data Platform 2.3 using the Apache Hive JDBC driver. You can then create reports, dashboards, and data analysis, which can be embedded into your own applications.
In this tutorial we are going to walkthrough the folllowing steps to demonstrate Apache Hive with JReport:

Install the Apache Hive JDBC driver with JReport.
Create a new JReport Catalog to manage the Hive connection.


## knox-ranger.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                saptak
                / knox-ranger.md
            
            
              Last active
              September 8, 2015 22:25
            
          
    Introduction

Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides central security policy administration across the core enterprise security requirements of authorization, accounting and data protection.
Apache Ranger already extends baseline features for coordinated enforcement across Hadoop workloads from batch, interactive SQL and real–time in Hadoop.
In this tutorial, we cover using Apache Ranger for HDP 2.3 to secure your Hadoop environment. We will walkthrough the following topics:

Support for Knox authorization and audit
Command line policies in Hive


## nifi.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                saptak
                / nifi.md
            
            
              Last active
              September 20, 2015 20:42
            
          
    ###Introduction to Apache NiFi
A very common scenario in many large organizations is to define, operationalize and manage complex dataflow between myriad distributed systems that often speak different protocols and understand different data formats. Messaging-based solutions are a popular answer these days, but they don’t address many of the fundamental challenges of enterprise dataflow.
###Data Workflow scenario
Let's dive deeper into the dataflow requirement. On one end we have systems that acquire data, whether they are sensors, business, or organizations gathering data for your business. That information that is collected needs to be sent to processing systems, analytics systems like Hadoop, Storm, Spark, etc and then ulimately needs to be persisted into a backing store where business users can apply analytics on the data at rest to derive business value.
Let's consisder the scenario of IoT or Remote Sensor Delivery. As the data gets collected by remote sensors on factory floors, oil rigs or travelling

  
## searching_text_image_solr.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                saptak
                / searching_text_image_solr.md
            
            
              Last active
              September 30, 2015 17:40
            
          
    A very common request from many customers is to be able to index text in image files; for example, text in scanned PNG files. In this tutorial we are going to walkthrough how to do this with SOLR.
###Prerequisite

Hortonworks Sandbox

Step-by-step guide


**Install dependencies - **this will provide you support for processing pngs, jpegs, and tiffs


## indexing-documents-with-apache-sol.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                saptak
                / indexing-documents-with-apache-sol.md
            
            
              Last active
              October 1, 2015 16:01
            
              
                Indexing documents with Apache Solr
              
          
    In this tutorial, we will learn to:

Configure Solr to store indexes in HDFS
Create a solr cluster of 2 solr instances running on port 8983 and 8984
Index documents in HDFS using the Hadoop connectors
Use Solr to search documents

Pre-Requisite


Hortonworks Sandbox


## frequency.py
import sys
import json

def main():
    tweet_file = open(sys.argv[1])
    terms_freq = {}
    totterm = 0.0

    for line in tweet_file:
        tweet=json.loads(line)

## how-to-process-data-with-hive.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                saptak
                / how-to-process-data-with-hive.md
            
            
              Created
              October 8, 2015 15:07
            
          
    Data processing with Hive

Hive is a component of Hortonworks Data Platform(HDP). Hive provides a SQL-like interface to data stored in HDP. In the previous tutorial we used Pig which is a scripting language with a focus on dataflows. Hive provides a database query interface to Apache Hadoop.
People often ask why do Pig and Hive exist when they seem to do much of the same thing. Hive because of its SQL like query language is often used as the interface to an Apache Hadoop based data warehouse. Hive is considered friendlier and more familiar to users who are used to using SQL for querying data. Pig fits in through its data flow strengths where it takes on the tasks of bringing data into Apache Hadoop and working with it to get it into the form for querying. A good overview of how this works is in Alan Gates posting on the Yahoo Developer blog titled Pig and Hive at Yahoo! From a technical point of view both Pig and Hive are feature complet
	// Use Gists to store code you would like to remember later on
	console.log(window); // log the "window" object to the console
	var str1="http://127.0.0.1:5000";
	var str2="http://blah.blah.com";
	var i1=str1.search(/:\d/);
	console.log(i1);
	i1=str2.search(/:\d/);
	console.log(i1);
	import java.io.BufferedReader;
	import java.io.File;
	import java.io.FileReader;
	import java.lang.reflect.Array;
	import java.security.MessageDigest;
	import java.security.NoSuchAlgorithmException;
	import java.util.*;

	public class MP1 {
	Random generator;
	import sys
	import json

	def main():
	tweet_file = open(sys.argv[1])
	terms_freq = {}
	totterm = 0.0

	for line in tweet_file:
	tweet=json.loads(line)