Skip to content

Instantly share code, notes, and snippets.

@machuz
Last active December 2, 2015 03:11
Show Gist options
  • Save machuz/a0cc218c6d6570512546 to your computer and use it in GitHub Desktop.
Save machuz/a0cc218c6d6570512546 to your computer and use it in GitHub Desktop.
Mahoutインストール〜Model作成まで ref: http://qiita.com/ma2k8/items/10d44097607525db9893
おはようございます!プログラマーの神様!(T_T)
おはよう プログラマー 代表 神様 t t
hadoop dfs -ls /monitoring/kerberos/
Found 2 items
drwxr-xr-x - hdfs hadoop 0 2014-03-13 15:39 /data/ng-text
drwxr-xr-x - hdfs hadoop 0 2014-03-13 15:40 /data/ok-text
$ mahout seqdumper -i /monitoring/labelindex
Running on hadoop, using /usr/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
14/03/13 19:57:48 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/monitoring/labelindex], --startPhase=[0], --tempDir=[temp]}
Input Path: /monitoring/labelindex
Key class: class org.apache.hadoop.io.Text Value Class: class org.apache.hadoop.io.IntWritable
Key: ng: Value: 0
Key: ok: Value: 1
Count: 2
14/03/13 19:57:49 INFO driver.MahoutDriver: Program took 1735 ms (Minutes: 0.028916666666666667)
$ mahout testnb -i /monitoring/test-vectors/tfidf-vectors -o /monitoring/test1 -m /monitoring/test-model -l /monitoring/labelindex
Summary
-------------------------------------------------------
Correctly Classified Instances : 3083 87.0167%
Incorrectly Classified Instances : 460 12.9833%
Total Classified Instances : 3543
=======================================================
Confusion Matrix
-------------------------------------------------------
a b <--Classified as
1215 442 | 1657 a = ng
18 1868 | 1886 b = ok
14/03/13 20:21:09 INFO driver.MahoutDriver: Program took 19632 ms (Minutes: 0.32721666666666666)
$ hadoop dfs -ls /data/ok-data/ |more
Found 1897 items
-rw-r--r-- 3 matsukawa_tsubasa hadoop 24 2014-03-13 18:36 /data/ok-text/1311.txt
-rw-r--r-- 3 matsukawa_tsubasa hadoop 136 2014-03-13 18:36 /data/ok-text/1312.txt
-rw-r--r-- 3 matsukawa_tsubasa hadoop 115 2014-03-13 18:36 /data/ok-text/1313.txt
-rw-r--r-- 3 matsukawa_tsubasa hadoop 24 2014
$ mahout seqdirectory -i /data/ -o /data-seq
$ mahout seqdumper -i /data-seq/chunk-0
$ mahout seq2sparse -i /data-seq -o /data-vectors -a org.apache.lucene.analysis.core.WhitespaceAnalyzer
org.apache.lucene.analysis.core.WhitespaceAnalyzer
org.apache.lucene.analysis.WhitespaceAnalyzer
$ mahout vectordump -i /monitoring/kerberos-vectors/tfidf-vectors
$ mahout seqdumper -i /monitoring/kerberos-vectors/wordcount | sort -nrk4
$ mahout trainnb -i /monitoring/test-vectors/tfidf-vectors -o /monitoring/test-model -el -li /monitoring/labelindex
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment