Skip to content

Instantly share code, notes, and snippets.

Avatar
💭
❤️ DATA ❤️

Milind Jagre milindjagre

💭
❤️ DATA ❤️
View GitHub Profile
@milindjagre
milindjagre / pom.xml
Last active Jan 7, 2016
This is pom.xml file which I use for building my MapReduce jobs on Hadoop 2.5.1
View pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.milind</groupId>
<artifactId>GitHub</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
@milindjagre
milindjagre / WordDriver.java
Created Apr 11, 2016
This is Driver Class used while reading a Microsoft Word Document file through MapReduce API
View WordDriver.java
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.mr.worddoc;
/**
*
* @author milind
@milindjagre
milindjagre / WordInputFormat.java
Created Apr 11, 2016
This is custom Input Format Class which is used while reading Microsoft Word Document file using MapReduce API
View WordInputFormat.java
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.mr.worddoc;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
@milindjagre
milindjagre / WordMapper.class
Created Apr 11, 2016
This is Mapper Class which is used while reading Microsoft Word Document file using MapReduce API
View WordMapper.class
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.mr.worddoc;
/**
*
* @author milind
@milindjagre
milindjagre / WordParser.java
Last active Apr 11, 2016
This java class is responsible for parsing Microsoft Word Document data word by word. It will accept InputStream Object of filepath and emit the same data by appending new line once it reaches new line.
View WordParser.java
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.mr.worddoc;
/**
*
* @author milind
@milindjagre
milindjagre / WordRecordReader.java
Created Apr 11, 2016
This class is calling WordParser.java class to read Microsoft Word Document file.
View WordRecordReader.java
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.mr.worddoc;
/**
*
* @author milind
@milindjagre
milindjagre / MicrosoftWordDocReader.java
Created Apr 11, 2016
This is standalone java code which is used for reading Microsoft Office Word Files.
View MicrosoftWordDocReader.java
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.mr.doc.test;
/**
*
* @author milind
@milindjagre
milindjagre / WordToPdf.java
Created Apr 13, 2016
This java file will convert word file into pdf file. Word to Pdf converter using JAVA API.
View WordToPdf.java
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.word.to.pdf;
import com.itextpdf.text.Chunk;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
@milindjagre
milindjagre / pom.xml
Created Apr 13, 2016
This is pom.xml used for Standalone JAVA code used for Reading Microsoft Word Files.
View pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.milind</groupId>
<artifactId>mr-doc</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
@milindjagre
milindjagre / WritePdf.java
Last active Apr 13, 2016
This standalone java code will enable us to write PDF files using JAVA API
View WritePdf.java
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.word.to.pdf;
import com.itextpdf.text.BadElementException;
import com.itextpdf.text.BaseColor;
import com.itextpdf.text.Chunk;
import com.itextpdf.text.Document;
You can’t perform that action at this time.