Skip to content

Instantly share code, notes, and snippets.

@MrTonyDuarte
MrTonyDuarte / Readme.txt
Last active March 20, 2017 22:57
Code for Python Evaluator
CDAP Pipelines are saved and restored using JSON format. This file can be imported as a new Pipeline using Studio.
Tony Duarte
Cask Training Specialist
package org;
import co.cask.cdap.api.spark.JavaSparkExecutionContext;
import co.cask.cdap.api.spark.JavaSparkMain;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import co.cask.cdap.api.app.AbstractApplication;
import co.cask.cdap.api.mapreduce.AbstractMapReduce;
@MrTonyDuarte
MrTonyDuarte / PythonPluginCode.py
Last active July 7, 2017 23:24
PythonForBigDataProgramming
def transform(record, emitter, context): (1)(2)
import sys (3)
# Debug write of record components to cdap.log file (4)
sys.stdout.write("XXXoffset: %i\n" % record['offset']) (5)
sys.stdout.write("XXXbody: %s\n" % record['body'])
# now write the unmodified record object to the next pipeline stage
emitter.emit(record) (6)
@MrTonyDuarte
MrTonyDuarte / CdapWorkflowApp.java
Last active July 7, 2017 23:27
Combining Hadoop and Spark in a Data Processing Pipeline
public class CdapWorkflowApp extends AbstractApplication {
@Override
public void configure() {
addSpark(new MySparkRunr()); (1)
addMapReduce(new MyMR());
addWorkflow(new MyWF()); (2)
scheduleWorkflow(Schedules.builder("every5Min").createTimeSchedule("*/5 * * * *"), "MyWF"); (3)
}
public static class MyWF extends AbstractWorkflow { (4)