Skip to content

Instantly share code, notes, and snippets.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>Test</groupId>
<artifactId>Test</artifactId>
<version>0.0.1-SNAPSHOT</version>
<build>
<sourceDirectory>src</sourceDirectory>
<plugins>
<plugin>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.chetan</groupId>
<artifactId>dropsample</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>dropsample</name>
● Understand the Business
● Understand the Data
● Cleanse the Data
● Do Analytics the Data
● Predict the Data
● Visualize the data
● Build Insight that helps to grow Business Revenue
● Explain to Executive (CxO)
● Take Decision
● Increase Revenue
import nltk
import re
with open('/home/chetan/Documents/sample-certificate.txt','r') as file:
text = file.read()
# print(text)
sentences = nltk.sent_tokenize(text)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
# print(tokenized_sentences)
sign_date = {}
producerConfig:
buffer.memory: default
batch.size: "327680"
linger.ms: "5"
compression.type: lz4
retries: default
send.buffer.bytes: default
connections.max.idle.ms: default
patchConfig:
waitTime: "300000"
bin/spark-shell --driver-class-path /home/chetan/Documents/hortonworks-shc/shc/core/target/s-1.0.2-2.0-s_2.11-SNAPSHOT.jar
import org.apache.spark.sql.{SQLContext, _}
import org.apache.spark.sql.execution.datasources.hbase._
import org.apache.spark.{SparkConf, SparkContext}
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
def catalog =s"""{
| |"table":{"namespace":"default", "name":"shcExampleTable"},
/usr/local/hive/
$SPARK_HOME/bin/spark-shell --packages it.nerdammer.bigdata:spark-hbase-connector_2.10:1.0.3 --conf spark.hbase.host=127.0.0.1
/*
@Author: Chetan Khatri
Description: This Scala script has written for HBase to Hive module, which reads table from HBase and dump it out to Hive
*/
import it.nerdammer.spark.hbase._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StructField
package com.chetan.poc.hbase
/**
* Created by chetan on 24/1/17.
*/
import org.apache.spark._
import org.apache.hadoop.hbase.{CellUtil, HBaseConfiguration, TableName}
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client._
1. Hands-on session : Introduction to Linux
2. Hands-on session : AngularJS 2
3. Introduction to Git Protocol
4. Continues Integration with Jenkins
5. An Introduction to Contains with Docker and Kubernate
6. Linux Kernel Programming
7. Introduction to Free and open-source software (FOSS)
8. Introduction to Scrume / Agile methodology with JIRA ( For Example, use any Apache Project)
9. Think Statastics with Python
10. Think Baysian with Python
http://blog.contus.com/how-whatsapp-works-technically-and-how-to-build-an-app-similar-to-it/
http://erlang.org/doc/reference_manual/data_types.html
http://www.erlang-factory.com/conference/SFBay2012/speakers/RickReed
https://www.vocal.com/cryptography/rc4-encryption-algoritm/
http://www.contus.com/messaging-solutions.php
http://erlang.org/doc/man/mnesia.html
https://www.ejabberd.im/
http://www.erlang-factory.com/upload/presentations/558/efsf2012-whatsapp-scaling.pdf
http://stackoverflow.com/questions/2904669/how-easy-is-it-to-get-a-custom-xmpp-server-app-running
http://www.erlang-factory.com/upload/presentations/708/HitchhikersTouroftheBEAM.pdf