Skip to content

Instantly share code, notes, and snippets.

@stdatalabs
Last active October 24, 2016 08:03
Show Gist options
  • Save stdatalabs/36ca37d0b2308ce8f7966b72ce7b4a05 to your computer and use it in GitHub Desktop.
Save stdatalabs/36ca37d0b2308ce8f7966b72ce7b4a05 to your computer and use it in GitHub Desktop.
A CustomTextInputFormat class that extends from TextInputFormat and calls the customRecordReader class. More @ stdatalabs.blogspot.com
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapred.InputSplit;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.RecordReader;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import com.stdatalabs.hive.customInputFormat.customRecordReader;;
/**
* A custom input format for dealing with skipping records with specific length.
*
* More discussion at stdatalabs.blogspot.com
*
* @author Sachin Thirumala
*/
class CustomTextInputFormat extends TextInputFormat {
@Override
public RecordReader<LongWritable, Text> getRecordReader(InputSplit inputSplit, JobConf job, Reporter reporter)
throws IOException {
return new customRecordReader((FileSplit) inputSplit, job);
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment