Skip to content

Instantly share code, notes, and snippets.

View milindjagre's full-sized avatar
💭
❤️ DATA ❤️

Milind Jagre milindjagre

💭
❤️ DATA ❤️
View GitHub Profile
@milindjagre
milindjagre / pom.xml
Last active January 7, 2016 07:08
This is pom.xml file which I use for building my MapReduce jobs on Hadoop 2.5.1
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.milind</groupId>
<artifactId>GitHub</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
@milindjagre
milindjagre / WordDriver.java
Created April 11, 2016 10:47
This is Driver Class used while reading a Microsoft Word Document file through MapReduce API
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.mr.worddoc;
/**
*
* @author milind
@milindjagre
milindjagre / WordInputFormat.java
Created April 11, 2016 10:48
This is custom Input Format Class which is used while reading Microsoft Word Document file using MapReduce API
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.mr.worddoc;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
@milindjagre
milindjagre / WordMapper.class
Created April 11, 2016 10:49
This is Mapper Class which is used while reading Microsoft Word Document file using MapReduce API
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.mr.worddoc;
/**
*
* @author milind
@milindjagre
milindjagre / WordRecordReader.java
Created April 11, 2016 10:52
This class is calling WordParser.java class to read Microsoft Word Document file.
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.mr.worddoc;
/**
*
* @author milind
@milindjagre
milindjagre / MicrosoftWordDocReader.java
Created April 11, 2016 10:56
This is standalone java code which is used for reading Microsoft Office Word Files.
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.mr.doc.test;
/**
*
* @author milind
@milindjagre
milindjagre / WordParser.java
Last active April 11, 2016 10:59
This java class is responsible for parsing Microsoft Word Document data word by word. It will accept InputStream Object of filepath and emit the same data by appending new line once it reaches new line.
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.mr.worddoc;
/**
*
* @author milind
@milindjagre
milindjagre / pom.xml
Created April 13, 2016 12:03
This is pom.xml used for Standalone JAVA code used for Reading Microsoft Word Files.
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.milind</groupId>
<artifactId>mr-doc</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
@milindjagre
milindjagre / pom.xml
Created April 13, 2016 12:17
This pom.xml file is used for third party jar files involved in Writing PDF files using JAVA API
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.milind</groupId>
<artifactId>word-to-pdf</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.7</maven.compiler.source>
@milindjagre
milindjagre / WritePdf.java
Last active April 13, 2016 12:18
This standalone java code will enable us to write PDF files using JAVA API
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
package com.milind.word.to.pdf;
import com.itextpdf.text.BadElementException;
import com.itextpdf.text.BaseColor;
import com.itextpdf.text.Chunk;
import com.itextpdf.text.Document;