Skip to content

Instantly share code, notes, and snippets.

@ceteri
Last active December 11, 2015 10:39
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ceteri/4588568 to your computer and use it in GitHub Desktop.
Save ceteri/4588568 to your computer and use it in GitHub Desktop.
Pattern machine learning library for Cascading
bash-3.2$ pwd
/Users/ceteri/src/concur/pattern
bash-3.2$ java -version
java version "1.6.0_43"
Java(TM) SE Runtime Environment (build 1.6.0_43-b01-447-11M4203)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01-447, mixed mode)
bash-3.2$ hadoop version
Warning: $HADOOP_HOME is deprecated.
Hadoop 1.0.3
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192
Compiled by hortonfo on Tue May 8 20:31:25 UTC 2012
From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be
bash-3.2$ gradle -version
------------------------------------------------------------
Gradle 1.4
------------------------------------------------------------
Gradle build time: Monday, January 28, 2013 3:42:46 AM UTC
Groovy: 1.8.6
Ant: Apache Ant(TM) version 1.8.4 compiled on May 22 2012
Ivy: 2.2.0
JVM: 1.6.0_43 (Apple Inc. 20.14-b01-447)
OS: Mac OS X 10.7.5 x86_64
bash-3.2$ gradle --info --stacktrace clean test
Starting Build
Settings evaluated using empty settings script.
Projects loaded. Root project using build file '/Users/ceteri/src/concur/pattern/build.gradle'.
Included projects: [root project 'pattern']
Evaluating root project 'pattern' using build file '/Users/ceteri/src/concur/pattern/build.gradle'.
All projects evaluated.
Selected primary tasks 'clean', 'test'
Tasks to be executed: [task ':clean', task ':compileJava', task ':processResources', task ':classes', task ':compileTestJava', task ':processTestResources', task ':testClasses', task ':test']
:clean
Task ':clean' has not declared any outputs, assuming that it is out-of-date.
:compileJava
Executing task ':compileJava' due to:
Output file /Users/ceteri/src/concur/pattern/build/classes/main for task ':compileJava' has changed.
Output file /Users/ceteri/src/concur/pattern/build/dependency-cache for task ':compileJava' has changed.
Output file /Users/ceteri/src/concur/pattern/build/classes/main/pattern/ClassifierFunction$Context.class has been removed for task ':compileJava'.
Output file /Users/ceteri/src/concur/pattern/build/classes/main/pattern/model/glm/PCell.class has been removed for task ':compileJava'.
Output file /Users/ceteri/src/concur/pattern/build/classes/main/pattern/model/tree/Vertex.class has been removed for task ':compileJava'.
Output file /Users/ceteri/src/concur/pattern/build/classes/main/pattern/model/glm/LinkFunction$5.class has been removed for task ':compileJava'.
Output file /Users/ceteri/src/concur/pattern/build/classes/main/pattern/datafield/CategoricalDataField.class has been removed for task ':compileJava'.
Output file /Users/ceteri/src/concur/pattern/build/classes/main/pattern/model/lm/RegressionModel.class has been removed for task ':compileJava'.
Output file /Users/ceteri/src/concur/pattern/build/classes/main/pattern/predictor/PredictorFactory.class has been removed for task ':compileJava'.
Output file /Users/ceteri/src/concur/pattern/build/classes/main/pattern/model/Model.class has been removed for task ':compileJava'.
33 more ...
Compiling with JDK 6 Java compiler API.
Note: /Users/ceteri/src/concur/pattern/src/main/java/pattern/Classifier.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
:processResources
Skipping task ':processResources' as it has no source files.
:processResources UP-TO-DATE
:classes
Skipping task ':classes' as it has no actions.
:compileTestJava
Cached resource is up-to-date (lastModified: Tue Oct 05 01:51:27 PDT 2010). [HTTP: http://repo1.maven.org/maven2/junit/junit/4.8.2/junit-4.8.2.pom]
Resource missing. [HTTP GET: http://conjars.org/repo/junit/junit/maven-metadata.xml]
Resource missing. [HTTP GET: http://conjars.org/repo/junit/junit/]
Resource missing. [HTTP GET: http://conjars.org/repo/junit/junit/maven-metadata.xml]
Resource missing. [HTTP GET: http://conjars.org/repo/junit/junit/]
:: loading settings :: url = jar:file:/Users/ceteri/opt/gradle-1.4/lib/ivy-2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Executing task ':compileTestJava' due to:
Output file /Users/ceteri/src/concur/pattern/build/dependency-cache for task ':compileTestJava' has changed.
Output file /Users/ceteri/src/concur/pattern/build/classes/test for task ':compileTestJava' has changed.
Output file /Users/ceteri/src/concur/pattern/build/classes/test/pattern/model/KMeansTest.class has been removed for task ':compileTestJava'.
Output file /Users/ceteri/src/concur/pattern/build/classes/test/pattern/model/ModelTest.class has been removed for task ':compileTestJava'.
Output file /Users/ceteri/src/concur/pattern/build/classes/test/pattern/model/RandomForestTest.class has been removed for task ':compileTestJava'.
Compiling with JDK 6 Java compiler API.
:processTestResources
Skipping task ':processTestResources' as it has no source files.
:processTestResources UP-TO-DATE
:testClasses
Skipping task ':testClasses' as it has no actions.
:test
Executing task ':test' due to:
Output file /Users/ceteri/src/concur/pattern/build/reports/tests for task ':test' has changed.
Output file /Users/ceteri/src/concur/pattern/build/test-results/binary/test for task ':test' has changed.
Output file /Users/ceteri/src/concur/pattern/build/test-results for task ':test' has changed.
Output file /Users/ceteri/src/concur/pattern/build/test-results/binary/test/pattern.model.RandomForestTest.stderr has been removed for task ':test'.
Output file /Users/ceteri/src/concur/pattern/build/reports/tests/css3-pie-1.0beta3.htc has been removed for task ':test'.
Output file /Users/ceteri/src/concur/pattern/build/test-results/binary/test/results.bin has been removed for task ':test'.
Output file /Users/ceteri/src/concur/pattern/build/reports/tests/base-style.css has been removed for task ':test'.
Output file /Users/ceteri/src/concur/pattern/build/reports/tests/report.js has been removed for task ':test'.
Output file /Users/ceteri/src/concur/pattern/build/reports/tests/index.html has been removed for task ':test'.
Output file /Users/ceteri/src/concur/pattern/build/test-results/TEST-pattern.model.KMeansTest.xml has been removed for task ':test'.
6 more ...
Starting process 'Gradle Worker 1'. Working directory: /Users/ceteri/src/concur/pattern Command: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java -Djava.security.manager=jarjar.org.gradle.process.internal.child.BootstrapSecurityManager -Dfile.encoding=MacRoman -ea -cp /Users/ceteri/.gradle/caches/1.4/workerMain/gradle-worker.jar jarjar.org.gradle.process.internal.launcher.GradleWorkerMain
An attempt to initialize for well behaving parent process finished.
Successfully started process 'Gradle Worker 1'
Gradle Worker 1 executing tests.
Running test: test testMain(pattern.model.KMeansTest)
2013-03-16 18:20:31.940 java[1558:cc03] Unable to load realm info from SCDynamicStore
Test: test testMain(pattern.model.KMeansTest) produced standard out/err: 13/03/16 18:20:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
pattern.model.KMeansTest > testMain STANDARD_ERROR
13/03/16 18:20:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Test: test testMain(pattern.model.KMeansTest) produced standard out/err: 13/03/16 18:20:32 WARN snappy.LoadSnappy: Snappy native library not loaded
13/03/16 18:20:32 WARN snappy.LoadSnappy: Snappy native library not loaded
Test: test testMain(pattern.model.KMeansTest) produced standard out/err: 13/03/16 18:20:32 INFO mapred.FileInputFormat: Total input paths to process : 1
13/03/16 18:20:32 INFO mapred.FileInputFormat: Total input paths to process : 1
Running test: test testMain(pattern.model.RandomForestTest)
Test: test testMain(pattern.model.RandomForestTest) produced standard out/err: 13/03/16 18:20:32 INFO mapred.FileInputFormat: Total input paths to process : 1
pattern.model.RandomForestTest > testMain STANDARD_ERROR
13/03/16 18:20:32 INFO mapred.FileInputFormat: Total input paths to process : 1
Gradle Worker 1 finished executing tests.
Process 'Gradle Worker 1' finished with exit value 0 (state: SUCCEEDED)
Finished generating test XML results (0.018 secs)
Generating HTML test report...
Finished generating test html results (0.151 secs)
BUILD SUCCESSFUL
Total time: 9.658 secs
bash-3.2$
bash-3.2$ gradle clean jar
:clean
:compileJava
Note: /Users/ceteri/src/concur/pattern/src/main/java/pattern/Classifier.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
:processResources UP-TO-DATE
:classes
:jar
BUILD SUCCESSFUL
Total time: 4.637 secs
bash-3.2$ rm -rf out
bash-3.2$ hadoop jar build/libs/pattern.jar \
> data/iris.rf.tsv out/classify out/trap \
> --pmml data/iris.rf.xml \
> --assert \
> --measure out/measure --label species
Warning: $HADOOP_HOME is deprecated.
2013-03-16 18:23:55.204 java[1595:1903] Unable to load realm info from SCDynamicStore
13/03/16 18:23:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/16 18:23:55 WARN snappy.LoadSnappy: Snappy native library not loaded
13/03/16 18:23:55 INFO mapred.FileInputFormat: Total input paths to process : 1
13/03/16 18:23:55 INFO util.HadoopUtil: resolving application jar from found main method on: pattern.Main
13/03/16 18:23:55 INFO planner.HadoopPlanner: using application jar: /Users/ceteri/src/concur/pattern/build/libs/pattern.jar
13/03/16 18:23:55 INFO property.AppProps: using app.id: E9440CEDDC4748B43A3B5B0F8ED10A1F
13/03/16 18:23:55 INFO mapred.FileInputFormat: Total input paths to process : 1
13/03/16 18:23:56 INFO util.Version: Concurrent, Inc - Cascading 2.1.3
13/03/16 18:23:56 INFO flow.Flow: [classify] starting
13/03/16 18:23:56 INFO flow.Flow: [classify] source: Hfs["TextDelimited[['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species', 'predict']->[ALL]]"]["data/iris.rf.tsv"]
13/03/16 18:23:56 INFO flow.Flow: [classify] sink: Hfs["TextDelimited[[UNKNOWN]->['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species', 'predict', 'score']]"]["out/classify"]
13/03/16 18:23:56 INFO flow.Flow: [classify] sink: Hfs["TextDelimited[[UNKNOWN]->['species', 'score', 'count']]"]["out/measure"]
13/03/16 18:23:56 INFO flow.Flow: [classify] parallel execution is enabled: false
13/03/16 18:23:56 INFO flow.Flow: [classify] starting jobs: 2
13/03/16 18:23:56 INFO flow.Flow: [classify] allocating threads: 1
13/03/16 18:23:56 INFO flow.FlowStep: [classify] starting step: (1/2) out/classify
13/03/16 18:23:56 INFO mapred.FileInputFormat: Total input paths to process : 1
13/03/16 18:23:56 INFO flow.FlowStep: [classify] submitted hadoop job: job_local_0001
13/03/16 18:23:56 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/03/16 18:23:56 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/src/concur/pattern/data/iris.rf.tsv
13/03/16 18:23:56 INFO mapred.MapTask: numReduceTasks: 0
13/03/16 18:23:56 INFO hadoop.FlowMapper: cascading version: Concurrent, Inc - Cascading 2.1.3
13/03/16 18:23:56 INFO hadoop.FlowMapper: child jvm opts: -Xmx200m
13/03/16 18:23:56 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species', 'predict']->[ALL]]"]["data/iris.rf.tsv"]
13/03/16 18:23:56 INFO hadoop.FlowMapper: sinking to: Hfs["TextDelimited[[UNKNOWN]->['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species', 'predict', 'score']]"]["out/classify"]
13/03/16 18:23:56 INFO hadoop.FlowMapper: trapping to: Hfs["TextDelimited[[UNKNOWN]->[ALL]]"]["out/trap"]
13/03/16 18:23:57 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/03/16 18:23:57 INFO mapred.LocalJobRunner:
13/03/16 18:23:57 INFO mapred.Task: Task attempt_local_0001_m_000000_0 is allowed to commit now
13/03/16 18:23:57 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000000_0' to file:/Users/ceteri/src/concur/pattern/out/classify
13/03/16 18:23:59 INFO mapred.LocalJobRunner: file:/Users/ceteri/src/concur/pattern/data/iris.rf.tsv:0+5123
13/03/16 18:23:59 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/03/16 18:24:01 INFO flow.FlowStep: [classify] starting step: (2/2) out/measure
13/03/16 18:24:01 INFO mapred.FileInputFormat: Total input paths to process : 1
13/03/16 18:24:01 INFO flow.FlowStep: [classify] submitted hadoop job: job_local_0002
13/03/16 18:24:01 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/03/16 18:24:01 INFO io.MultiInputSplit: current split input path: file:/Users/ceteri/src/concur/pattern/data/iris.rf.tsv
13/03/16 18:24:01 INFO mapred.MapTask: numReduceTasks: 1
13/03/16 18:24:01 INFO mapred.MapTask: io.sort.mb = 100
13/03/16 18:24:01 INFO mapred.MapTask: data buffer = 79691776/99614720
13/03/16 18:24:01 INFO mapred.MapTask: record buffer = 262144/327680
13/03/16 18:24:01 INFO hadoop.FlowMapper: cascading version: Concurrent, Inc - Cascading 2.1.3
13/03/16 18:24:01 INFO hadoop.FlowMapper: child jvm opts: -Xmx200m
13/03/16 18:24:01 INFO hadoop.FlowMapper: sourcing from: Hfs["TextDelimited[['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species', 'predict']->[ALL]]"]["data/iris.rf.tsv"]
13/03/16 18:24:01 INFO hadoop.FlowMapper: sinking to: GroupBy(measure)[by:[{2}:'species', 'score']]
13/03/16 18:24:01 INFO hadoop.FlowMapper: trapping to: Hfs["TextDelimited[[UNKNOWN]->[ALL]]"]["out/trap"]
13/03/16 18:24:01 INFO hadoop.FlowMapper: trapping to: Hfs["TextDelimited[[UNKNOWN]->[ALL]]"]["out/trap"]
13/03/16 18:24:01 INFO mapred.MapTask: Starting flush of map output
13/03/16 18:24:01 INFO mapred.MapTask: Finished spill 0
13/03/16 18:24:01 INFO mapred.Task: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
13/03/16 18:24:04 INFO mapred.LocalJobRunner: file:/Users/ceteri/src/concur/pattern/data/iris.rf.tsv:0+5123
13/03/16 18:24:04 INFO mapred.Task: Task 'attempt_local_0002_m_000000_0' done.
13/03/16 18:24:04 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/03/16 18:24:04 INFO mapred.LocalJobRunner:
13/03/16 18:24:04 INFO mapred.Merger: Merging 1 sorted segments
13/03/16 18:24:04 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 11858 bytes
13/03/16 18:24:04 INFO mapred.LocalJobRunner:
13/03/16 18:24:04 INFO hadoop.FlowReducer: cascading version: Concurrent, Inc - Cascading 2.1.3
13/03/16 18:24:04 INFO hadoop.FlowReducer: child jvm opts: -Xmx200m
13/03/16 18:24:04 INFO hadoop.FlowReducer: sourcing from: GroupBy(measure)[by:[{2}:'species', 'score']]
13/03/16 18:24:04 INFO hadoop.FlowReducer: sinking to: Hfs["TextDelimited[[UNKNOWN]->['species', 'score', 'count']]"]["out/measure"]
13/03/16 18:24:04 INFO mapred.Task: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
13/03/16 18:24:04 INFO mapred.LocalJobRunner:
13/03/16 18:24:04 INFO mapred.Task: Task attempt_local_0002_r_000000_0 is allowed to commit now
13/03/16 18:24:04 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0' to file:/Users/ceteri/src/concur/pattern/out/measure
13/03/16 18:24:07 INFO mapred.LocalJobRunner: reduce > reduce
13/03/16 18:24:07 INFO mapred.Task: Task 'attempt_local_0002_r_000000_0' done.
13/03/16 18:24:11 INFO util.Hadoop18TapUtil: deleting temp path out/classify/_temporary
13/03/16 18:24:11 INFO util.Hadoop18TapUtil: deleting temp path out/measure/_temporary
bash-3.2$ head out/classify/part-00000
sepal_length sepal_width petal_length petal_width species predict score
5.1 3.5 1.4 0.2 setosa setosa setosa
4.9 3 1.4 0.2 setosa setosa setosa
4.7 3.2 1.3 0.2 setosa setosa setosa
4.6 3.1 1.5 0.2 setosa setosa setosa
5 3.6 1.4 0.2 setosa setosa setosa
5.4 3.9 1.7 0.4 setosa setosa setosa
4.6 3.4 1.4 0.3 setosa setosa setosa
5 3.4 1.5 0.2 setosa setosa setosa
4.4 2.9 1.4 0.2 setosa setosa setosa
bash-3.2$ head out/measure/part-00000
species score count
setosa setosa 50
versicolor versicolor 48
versicolor virginica 2
virginica versicolor 1
virginica virginica 49
bash-3.2$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment