Skip to content

Instantly share code, notes, and snippets.

@mumrah
Created May 25, 2012 14:28
Show Gist options
  • Save mumrah/2788429 to your computer and use it in GitHub Desktop.
Save mumrah/2788429 to your computer and use it in GitHub Desktop.
kmeans cli docs
$ /opt/mahout-distribution-0.6/bin/mahout kmeans -h
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
no HADOOP_HOME set, running locally
Class: org.apache.mahout.driver.MahoutDriver
Args: kmeans -h
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/mahout-distribution-0.6/mahout-examples-0.6-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mahout-distribution-0.6/lib/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/mahout-distribution-0.6/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
usage: <command> [Generic Options] [Job-Specific Options]
Generic Options:
-archives <paths> comma separated archives to be unarchived
on the compute machines.
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-files <paths> comma separated files to be copied to the
map reduce cluster
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-libjars <paths> comma separated jar files to include in
the classpath.
-tokenCacheFile <tokensFile> name of the file with the tokens
Job-Specific Options:
--input (-i) input Path to job input directory.
--output (-o) output The directory pathname for
output.
--distanceMeasure (-dm) distanceMeasure The classname of the
DistanceMeasure. Default is
SquaredEuclidean
--clusters (-c) clusters The input centroids, as Vectors.
Must be a SequenceFile of
Writable, Cluster/Canopy. If k
is also specified, then a random
set of vectors will be selected
and written out to this path
first
--numClusters (-k) k The k in k-Means. If specified,
then a random selection of k
Vectors will be chosen as the
Centroid and written to the
clusters input path.
--convergenceDelta (-cd) convergenceDelta The convergence delta value.
Default is 0.5
--maxIter (-x) maxIter The maximum number of
iterations.
--overwrite (-ow) If present, overwrite the output
directory before running job
--clustering (-cl) If present, run clustering after
the iterations have taken place
--method (-xm) method The execution method to use:
sequential or mapreduce. Default
is mapreduce
--help (-h) Print out help
--tempDir tempDir Intermediate output directory
--startPhase startPhase First phase to run
--endPhase endPhase Last phase to run
Specify HDFS directories while running on hadoop; else specify local file
system directories
12/05/25 10:28:19 INFO driver.MahoutDriver: Program took 939 ms (Minutes: 0.01565)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment