Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Accumulo Installation and Configuration Steps on a Ubuntu VirtualBox Instance
Oct 17, 2012 - See https://github.com/medined/accumulo_stackscript for an even better script. Really ignore the stuff below. Go. Scoot.
Aug 28, 2012 - See http://affy.blogspot.com/2012/08/stackscript-for-accumulo-on-linode.html for a more concise method is configuring accumulo. I'll leave this gist unchanged for fans of history.
My goal was to get Accumulo running on a VirtualBox Ubuntu instance. I was successful using the following steps. If a line starts with $ then it is a command-line to execute. Note that you'll need to have sudo privilege. My username was 'ubuntu'. If you are using a different username, you'll need to change the process a little bit. I'll try to point out where.
https://issues.apache.org/jira/browse/ACCUMULO
##########
# Start a new VirtualBox instance using the Ubuntu 11.10
# Desktop ISO with at least 4G RAM and at least 10G of
# disk space.
##########
##########
# For verification, you can display the OS release.
##########
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=11.10
DISTRIB_CODENAME=oneiric
DISTRIB_DESCRIPTION="Ubuntu 11.10"
##########
# Download all of the packages you'll need. Hopefully,
# you have a fast download connection.
##########
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install curl
$ sudo apt-get install git
$ sudo apt-get install maven2
$ sudo apt-get install openssh-server openssh-client
$ sudo apt-get install openjdk-7-jdk
##########
# Switch to the new Java. On my system, it was
# the third option (marked '2' naturally)
##########
$ sudo update-alternatives --config java
##########
# Set the JAVA_HOME variable. I took the
# time to update my .bashrc script.
##########
$ export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
##########
# I stored the Accumulo source code into
# ~/workspace/accumulo. After compilation, you'll
# be working with a second Accumulo directory. By
# placing this 'original source' version into
# workspace it is nicely segregated.
##########
$ mkdir -p ~/workspace
$ cd ~/workspace
$ git clone https://github.com/apache/accumulo.git
$ cd accumulo
##########
# Now we can compile Accumulo which creates the
# accumulo-assemble-1.5.0-incubating-SNAPSHOT-dist.tar.gz
# file in the src/assemble/target directory.
#
# This step confused me because the Accumulo README
# mentions mvn assembly:single and I tried to use
# that Maven command. It is not needed, at least not
# in this situation.
##########
$ mvn package
##########
# Now we can download Cloudera's version of Hadoop. The
# first step is adding the repository. Note that oneiric
# is not explicitly supported as of 2011-Dec-20. So I am
# using the 'maverick' repository.
##########
# Create a repository list file. Add the two indented lines
# to the new file.
$ sudo vi /etc/apt/sources.list.d/cloudera.list
deb http://archive.cloudera.com/debian maverick-cdh3 contrib
deb-src http://archive.cloudera.com/debian maverick-cdh3 contrib
# Add public key
$ curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -
$ sudo apt-get update
# Install all of the Hadoop components.
$ sudo apt-get install hadoop-0.20
$ sudo apt-get install hadoop-0.20-namenode
$ sudo apt-get install hadoop-0.20-datanode
$ sudo apt-get install hadoop-0.20-secondarynamenode
$ sudo apt-get install hadoop-0.20-jobtracker
$ sudo apt-get install hadoop-0.20-tasktracker
# Install zookeeper. It will automatically
# start.
sudo apt-get install hadoop-zookeeper-server
##########
# As an aside, you can use Ubuntu's service
# command to control zookeeper like this:
# sudo service hadoop-zookeeper-server start
##########
##########
# Now we can configure Pseudo-Distributed hadoop
# These steps were borrowed from
# http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html
##########
# Set some environment variables. I added these to my
# .bashrc file.
$ export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
$ export HADOOP_HOME=/usr/lib/hadoop-0.20
$ export ZOOKEEPER_HOME=/usr/lib/zookeeper
$ cd $HADOOP_HOME/conf
# Create the hadoop temp directory. It should not
# be in the /tmp directory because that directory
# disappears after each system restart. Something
# that is done a lot with virtual machines.
sudo mkdir /hadoop_tmp_dir
sudo chmod 777 hadoop_tmp_dir
# Replace the existing file with the indented lines.
$ sudo vi core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop_tmp_dir</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
##########
# Notice that the dfs secondary http address is not
# the default in the XML below. I don't know what
# process was using the default, but I needed to
# change it to avoid the 'port already in use' message.
##########
# Replace the existing file with the indented lines.
$ sudo vi hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.secondary.http.address</name>
<value>0.0.0.0:8002</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
# Replace the existing file with the indented lines.
$ sudo vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
# format the hadoop filesystem
$ hadoop namenode -format
##########
# Time to setup password-less ssh to localhost
##########
$ cd ~
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
# If you want to test that the ssh works, do this. Then exit.
$ ssh localhost
# Do some zookeeper configuration.
$ echo "maxClientCnxns=100" | sudo tee -a $ZOOKEEPER_HOME/conf/zoo.cfg
$ cd ~
$ export TAR_DIR=~/workspace/accumulo/src/assemble/target
$ tar xvzf $TAR_DIR/accumulo-1.5.0-incubating-SNAPSHOT-dist.tar.gz
# Add the following to your .bashrc file.
$ export ACCUMULO_HOME=~/accumulo-1.5.0-incubating-SNAPSHOT
$ cd $ACCUMULO_HOME/conf
###########
# I didn't see the metrics file mentioned in the README file but
# there was a complaint in a log file about its being missing.
###########
cp slaves.example slaves
cp masters.example masters
cp accumulo-env.sh.example accumulo-env.sh
cp accumulo-site.xml.example accumulo-site.xml
cp accumulo-metrics.xml.example accumulo-metrics.xml
# create the write-ahead directory.
cd ..
mkdir walogs
###########
# Configure for 4Gb RAM. I definitely recommend using more RAM
# if you have it. Since I am using a VirtualBox instance, I don't
# have much memory to play with.
###########
# Change these two parameters to reduce memory usage.
$ vi conf/accumulo-site.xml
tserver.memory.maps.max=256M
tserver.cache.index.size=128M
# Change (or add) the trace.password entry if the root password is
# not the default of "secret"
<property>
<name>trace.password</name>
<value>mypassword_for_root_user</value>
</property>
# Reduce the JVM memory. I have no real idea what these should be but these
# settings work. I consider them a magic formula. :)
vi conf/accumulo-env.sh
test -z "$ACCUMULO_TSERVER_OPTS" && export ACCUMULO_TSERVER_OPTS="${POLICY} -Xmx512m -Xms512m -Xss128k"
test -z "$ACCUMULO_MASTER_OPTS" && export ACCUMULO_MASTER_OPTS="${POLICY} -Xmx512m -Xms128m"
test -z "$ACCUMULO_MONITOR_OPTS" && export ACCUMULO_MONITOR_OPTS="${POLICY} -Xmx256m -Xms128m"
test -z "$ACCUMULO_GC_OPTS" && export ACCUMULO_GC_OPTS="-Xmx256m -Xms128m"
test -z "$ACCUMULO_LOGGER_OPTS" && export ACCUMULO_LOGGER_OPTS="-Xmx128m -Xms64m"
test -z "$ACCUMULO_GENERAL_OPTS" && export ACCUMULO_GENERAL_OPTS="-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75"
test -z "$ACCUMULO_OTHER_OPTS" && export ACCUMULO_OTHER_OPTS="-Xmx256m -Xms128m"
#######
#######
#######
#######
# REPEAT FOR EACH RESTART
#
# Since we are working inside a virtual machine, I found that
# some settings did not survive a shutdown or reboot. From this
# point on, repeat these command for each instance startup.
#######
# hadoop was installed as root. Therefore we need to
# change the ownership so that your username can
# write. IF YOU ARE NOT USING 'ubuntu', CHANGE THE
# COMMAND ACCORDINGLY.
$ sudo chown -R ubuntu:ubuntu /usr/lib/hadoop-0.20
$ sudo chown -R ubuntu:ubuntu /var/run/hadoop-0.20
$ sudo chown -R ubuntu:ubuntu /var/log/hadoop-0.20
# Start hadoop. I remove the logs so that I can find errors
# faster when I iterate through configuration settings.
$ cd $HADOOP_HOME
$ rm -rf logs/*
$ bin/start-dfs.sh
$ bin/start-mapred.sh
# If desired, look at the hadoop jobs. Your output should look something
# like the intended lines.
$ jps
4017 JobTracker
4254 TaskTracker
30279 Main
9808 Jps
3517 NameNode
3737 DataNode
##########
# This is an optional step to prove that the NameNode is running.
# Use a web browser like Firefix if you can.
##########
$ wget http://localhost:50070/
$ cat index.html
$ rm index.html
##########
# This is an optional step to prove that the JobTracker is running.
# Use a web browser like Firefix if you can.
##########
$ wget http://localhost:50030/
$ cat index.html
$ rm index.html
##########
# This is an optional step to prove that a map-reduce job
# can be run. In other words, that hadoop is working.
##########
$ hadoop dfs -rmr input
$ hadoop fs -put $HADOOP_HOME/conf input
$ hadoop jar $HADOOP_HOME/hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ hadoop fs -cat output/*
###########
# And now, the payoff. Let's get Accumulo to run.
###########
# Provide an instance (development) name and password (password) when asked.
$ cd $ACCUMULO_HOME
$ bin/accumulo init
# I remove the logs to make debugging easier.
rm -rf logs/*
bin/start-all.sh
##########
# This is an optional step to prove that Accumulo is running.
# Use a web browser like Firefox if you can.
##########
$ wget http://localhost:50095/
$ cat index.html
$ rm index.html
# Check the logs directory.
$ cd logs
# Look for content in .err or .out files. The file sizes should all be zero.
$ ls -l *.err *.out
# Look for error messages. Ignore messages about the missing libNativeMap file.
$ grep ERROR *
# Start the Accumulo shell. If this works, see the README file for an example
# how to use the shell.
$ bin/accumulo shell -u root -p password
###########
# Do a little victory dance. You're now an Accumulo user!
###########
##########
# Building Accumulo Documentation
##########
$ sudo apt-get install texlive-latex-base
$ sudo apt-get install texlive-latex-extra
$ rm ./docs/accumulo_user_manual.pdf
$ mvn -Dmaven.test.skip=true prepare-package
cd docs/src/developer_manual
pdflatex developer_manual && pdflatex developer_manual && pdflatex developer_manual && pdflatex developer_manual
##########
# Reading Documentation
##########
http://incubator.apache.org/accumulo/user_manual_1.4-incubating
docs/src/developer_manual/developer_manual.pdf
$ ls -l docs/examples
##########
# Things to Try
##########
bin/accumulo org.apache.accumulo.server.util.ListInstances
WHY DOES THIS NPE?
bin/accumulo org.apache.accumulo.server.util.DumpTable batchtest1
##########
# Running Accumulo Examples
##########
export EXAMPLE_JAR=lib/examples-simple-1.5.0-incubating-SNAPSHOT.jar
export EXAMPLE_PACKAGE=org.apache.accumulo.examples.simple
cd $ACCUMULO_HOME
export AINSTANCE=development
export AZOOKEEPERS=localhost
export AUSER=root
export APASSWORD=password
export AOPTIONS="$AINSTANCE $AZOOKEEPERS $AUSER $APASSWORD"
# ---------------------------
# Examples from README.batch
# ---------------------------
# start the command-line shell.
bin/accumulo shell -u root -p password
> setauths -u root -s exampleVis
> createtable batchtest1
> exit
export TABLE=batchtest1
export START=0
export NUM=10000
export VALUE_SIZE=50
export MAX_MEMORY=20000000
export MAX_LATENCY=500
export NUM_THREADS=20
export COLUMN_VISIBILITY=exampleVis
bin/accumulo $EXAMPLE_PACKAGE.client.SequentialBatchWriter $AOPTIONS $TABLE $START $NUM $VALUE_SIZE $MAX_MEMORY $MAX_LATENCY $NUM_THREADS $COLUMN_VISIBILITY
export NUM=1000
export MIN=0
export MAX=10000
export EXPECTED_VALUE_SIZE=50
bin/accumulo $EXAMPLE_PACKAGE.client.RandomBatchScanner $AOPTIONS $TABLE $NUM $MIN $MAX $EXPECTED_VALUE_SIZE $NUM_THREADS $COLUMN_VISIBILITY
# ----------------------------
# Examples from README.bloom
# ----------------------------
### create table without bloom filter.
bin/accumulo shell -u $AUSER -p $APASSWORD
> setauths -u root -s exampleVis
> createtable bloom_test1
bloom_test1> config -t bloom_test1 -s table.compaction.major.ratio=7
bloom_test1> exit
export TABLE=bloom_test1
export NUM=1000000
export MIN=0
export MAX=1000000000
export VALUE_SIZE=50
export MAX_MEMORY=2000000
export MAX_LATENCY=60000
export NUM_THREADS=20
export COLUMN_VISIBILITY=exampleVis
# create a million records
bin/accumulo $EXAMPLE_PACKAGE.client.RandomBatchWriter -s 7 $AOPTIONS $TABLE $NUM $MIN $MAX $VALUE_SIZE $MAX_MEMORY $MAX_LATENCY $NUM_THREADS $COLUMN_VISIBILITY
bin/accumulo shell -u $AUSER -p $APASSWORD -e 'flush -t bloom_test1 -w'
bin/accumulo $EXAMPLE_PACKAGE.client.RandomBatchWriter -s 8 $AOPTIONS $TABLE $NUM $MIN $MAX $VALUE_SIZE $MAX_MEMORY $MAX_LATENCY $NUM_THREADS $COLUMN_VISIBILITY
bin/accumulo shell -u $AUSER -p $APASSWORD -e 'flush -t bloom_test1 -w'
bin/accumulo $EXAMPLE_PACKAGE.client.RandomBatchWriter -s 9 $AOPTIONS $TABLE $NUM $MIN $MAX $VALUE_SIZE $MAX_MEMORY $MAX_LATENCY $NUM_THREADS $COLUMN_VISIBILITY
### create table with bloom filter.
bin/accumulo shell -u $AUSER -p $APASSWORD
> createtable bloom_test2
bloom_test2> config -t bloom_test2 -s table.compaction.major.ratio=7
bloom_test2> config -t bloom_test2 -s table.bloom.enabled=true
bloom_test2> exit
export TABLE=bloom_test2
bin/accumulo $EXAMPLE_PACKAGE.client.RandomBatchWriter -s 7 $AOPTIONS $TABLE $NUM $MIN $MAX $VALUE_SIZE $MAX_MEMORY $MAX_LATENCY $NUM_THREADS $COLUMN_VISIBILITY
bin/accumulo shell -u $AUSER -p $APASSWORD -e 'flush -t bloom_test2 -w'
bin/accumulo $EXAMPLE_PACKAGE.client.RandomBatchWriter -s 8 $AOPTIONS $TABLE $NUM $MIN $MAX $VALUE_SIZE $MAX_MEMORY $MAX_LATENCY $NUM_THREADS $COLUMN_VISIBILITY
bin/accumulo shell -u $AUSER -p $APASSWORD -e 'flush -t bloom_test2 -w'
bin/accumulo $EXAMPLE_PACKAGE.client.RandomBatchWriter -s 9 $AOPTIONS $TABLE $NUM $MIN $MAX $VALUE_SIZE $MAX_MEMORY $MAX_LATENCY $NUM_THREADS $COLUMN_VISIBILITY
bin/accumulo shell -u $AUSER -p $APASSWORD -e 'flush -t bloom_test2 -w'
### read table without bloom filter.
export TABLE=bloom_test1
export NUM=500
# same seed, records are found.
bin/accumulo $EXAMPLE_PACKAGE.client.RandomBatchScanner -s 7 $AOPTIONS $TABLE $NUM $MIN $MAX $EXPECTED_VALUE_SIZE $NUM_THREADS $COLUMN_VISIBILITY
# different seed, no results
bin/accumulo $EXAMPLE_PACKAGE.client.RandomBatchScanner -s 8 $AOPTIONS $TABLE $NUM $MIN $MAX $EXPECTED_VALUE_SIZE $NUM_THREADS $COLUMN_VISIBILITY
### read table with bloom filter.
export TABLE=bloom_test2
bin/accumulo $EXAMPLE_PACKAGE.client.RandomBatchScanner -s 7 $AOPTIONS $TABLE $NUM $MIN $MAX $EXPECTED_VALUE_SIZE $NUM_THREADS $COLUMN_VISIBILITY
### verify the map tables
# display the table ids.
bin/accumulo shell -u $AUSER -p $APASSWORD -e 'tables -l'
# display the hdfs files associated with the table id.
hadoop fs -lsr /accumulo/tables/3
# use PrintInfo to show the fies has a bloom filter.
bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo /accumulo/tables/4/default_tablet/F000000e.rf
# ----------------------------
# Examples from README.bulkIngest
# ----------------------------
export TABLE=test_bulk
export FIRST_SPLIT=row_00000333
export SECOND_SPLIT=row_00000666
bin/accumulo $EXAMPLE_PACKAGE.mapreduce.bulk.SetupTable $AOPTIONS $TABLE $FIRST_SPLIT $SECOND_SPLIT
export START=0
export END=1000
export BULK_FILE=bulk/test_1.txt
bin/accumulo $EXAMPLE_PACKAGE.mapreduce.bulk.GenerateTestData $START $END $BULK_FILE
#
# see the file that was just created
#
hadoop fs -cat $BULK_FILE
export INPUT=bulk
export OUTPUT=tmp/bulkWork
bin/tool.sh lib/accumulo-examples-*[^c].jar $EXAMPLE_PACKAGE.mapreduce.bulk.BulkIngestExample $AOPTIONS $TABLE $INPUT $OUTPUT
bin/accumulo $EXAMPLE_PACKAGE.mapreduce.bulk.VerifyIngest $AOPTIONS $TABLE $START $END
# -------------------------------
# Examples from README.combiner
# -------------------------------
bin/accumulo shell -u $AUSER -p $APASSWORD
>createtable runners
# enter 'stat' and '10' when asked
runners> setiter -t runners -p 10 -scan -minc -majc -n decStats -class org.apache.accumulo.examples.combiner.StatsCombiner
runners> setiter -t runners -p 11 -scan -minc -majc -n hexStats -class org.apache.accumulo.examples.combiner.StatsCombiner
runners> insert 123456 name first Joe
runners> insert 123456 stat marathon 240
runners> scan
runners> insert 123456 stat marathon 230
runners> insert 123456 stat marathon 220
#
# The next scan will show the min, max, sum, and count for the 123456:stat:marathon row.
#
runners> scan
runners> insert 123456 hstat virtualMarathon 6a
runners> insert 123456 hstat virtualMarathon 6b
#
# The next scan will show the min, max, sum, and count (in hexadecimal) for the 123456:hstat:marathon row.
#
runners> scan
runners> exit
# -------------------------------
# Examples from README.constraints
# -------------------------------
bin/accumulo shell -u $AUSER -p $APASSWORD
> createtable testConstraints
testConstraints> config -t testConstraints -s table.constraint.1=org.apache.accumulo.examples.constraints.NumericValueConstraint
testConstraints> config -t testConstraints -s table.constraint.2=org.apache.accumulo.examples.constraints.AlphaNumKeyConstraint
testConstraints> insert r1 cf1 cq1 1111
testConstraints> insert r1 cf1 cq1 ABC
Constraint Failures:
ConstraintViolationSummary(...NumericValueConstraint, ..., violationDescription:Value is not numeric...)
testConstraints> insert r1! cf1 cq1 ABC
Constraint Failures:
ConstraintViolationSummary(...NumericValueConstraint, ..., violationDescription:Value is not numeric...)
ConstraintViolationSummary(...AlphaNumKeyConstraint, ..., violationDescription:Row was not alpha numeric...)
testConstraints> scan
r1 cf1:cq1 [] 1111
testConstraints> exit
# -------------------------------
# Examples from README.dirlist
# -------------------------------
export DIR_TABLE=dirTable
export INDEX_TABLE=indexTable
export DATA_TABLE=dataTable
export AUTHORIZATION=exampleVis
export COLUMN_VISIBILITY=exampleVis
export DATA_CHUNK_SIZE=100000
export DIR_TO_INDEX=/home/$USER/workspace
# index the directory on local disk
bin/accumulo $EXAMPLE_PACKAGE.dirlist.Ingest $AOPTIONS $DIR_TABLE $INDEX_TABLE $DATA_TABLE $COLUMN_VISIBILITY $DATA_CHUNK_SIZE $DIR_TO_INDEX
export DIR_TO_VIEW=/home/$USER/workspace/accumulo/conf
bin/accumulo $EXAMPLE_PACKAGE.dirlist.Viewer $AOPTIONS $DIR_TABLE $DATA_TABLE $AUTHORIZATION $DIR_TO_VIEW
# display information about a directory.
export DIR_TO_VIEW=/home/$USER/workspace/accumulo/conf
bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil $AOPTIONS $DIR_TABLE $COLUMN_VISIBILITY $DIR_TO_VIEW
# find files
export FILE_TO_FIND=masters.example
bin/accumulo $EXAMPLE_PACKAGE.dirlist.QueryUtil $AOPTIONS $INDEX_TABLE $COLUMN_VISIBILITY $FILE_TO_FIND -search
export TRAILING_WILDCARD="masters*"
bin/accumulo $EXAMPLE_PACKAGE.dirlist.QueryUtil $AOPTIONS $INDEX_TABLE $COLUMN_VISIBILITY $TRAILING_WILDCARD -search
export LEADING_WILDCARD="*.jar"
bin/accumulo $EXAMPLE_PACKAGE.dirlist.QueryUtil $AOPTIONS $INDEX_TABLE $COLUMN_VISIBILITY $LEADING_WILDCARD -search
export WILDCARD="commons*.jar"
bin/accumulo $EXAMPLE_PACKAGE.dirlist.QueryUtil $AOPTIONS $INDEX_TABLE $COLUMN_VISIBILITY $WILDCARD -search
# count files
export AUTHORIZATION=exampleVis
export COLUMN_VISIBILITY=exampleVis
bin/accumulo $EXAMPLE_PACKAGE.dirlist.FileCount $AOPTIONS $DIR_TABLE $AUTHORIZATION $COLUMN_VISIBILITY
# -------------------------------
# Examples from README.filedata
# -------------------------------
How is FileDataIngest used?
* FileDataIngest - Takes a list of files and archives them into Accumulo keyed on the SHA1 hashes of the files.
# -------------------------------
# Examples from README.filter
# -------------------------------
bin/accumulo shell -u $AUSER -p $APASSWORD
> createtable filtertest
filtertest> setiter -t filtertest -scan -p 10 -n myfilter -filter
WAITING FOR JIRA TICKET RESOLUTION.
# -------------------------------
# Examples from README.helloworld
# -------------------------------
bin/accumulo shell -u $AUSER -p $APASSWORD
> createtable hellotable
hellotable> exit
export TABLE=hellotable
bin/accumulo $EXAMPLE_PACKAGE.helloworld.InsertWithBatchWriter $AINSTANCE $AZOOKEEPERS $TABLE $AUSER $APASSWORD
# insert via map-reduce
bin/accumulo $EXAMPLE_PACKAGE.helloworld.InsertWithOutputFormat $AINSTANCE $AZOOKEEPERS $TABLE $AUSER $APASSWORD
# display the records using the shell
bin/accumulo shell -u $AUSER -p $APASSWORD
> table hellotable
> scan
> exit
# display the records
bin/accumulo $EXAMPLE_PACKAGE.helloworld.ReadData $AINSTANCE $AZOOKEEPERS $TABLE $AUSER $APASSWORD
# -------------------------------
# Examples from README.mapred
# -------------------------------
hadoop fs -copyFromLocal $ACCUMULO_HOME/README wc/Accumulo.README
hadoop fs -ls wc
bin/accumulo shell -u $AUSER -p $APASSWORD
> createtable wordCount -a count=org.apache.accumulo.core.iterators.aggregation.StringSummation
> exit
export INPUT=wc
export OUTPUT=wordCount
bin/tool.sh lib/accumulo-examples-*[^c].jar $EXAMPLE_PACKAGE.mapreduce.WordCount $AINSTANCE $AZOOKEEPERS $INPUT $OUTPUT -u $AUSER -p $APASSWORD
# read the count from the accumulo table.
bin/accumulo shell -u $AUSER -p $APASSWORD
> table wordCount
wordCount> scan -b the
wordCount> exit
# -------------------------------
# Examples from README.shard
# -------------------------------
# create accumulo tables
bin/accumulo shell -u $AUSER -p $APASSWORD
> createtable shard
shard> createtable doc2term
doc2term> exit
# index some files
cd $ACCUMULO_HOME
export SHARD_TABLE=shard
export NUM_PARTITIONS=30
find src -name "*.java" | xargs bin/accumulo $EXAMPLE_PACKAGE.shard.Index $AINSTANCE $AZOOKEEPERS $SHARD_TABLE $AUSER $APASSWORD $NUM_PARTITIONS
export TERMS_TO_FIND="foo bar"
bin/accumulo $EXAMPLE_PACKAGE.shard.Query $AINSTANCE $AZOOKEEPERS $SHARD_TABLE $AUSER $APASSWORD $TERMS_TO_FIND
# populate doc2term
export DOC2TERM_TABLE=doc2term
bin/accumulo $EXAMPLE_PACKAGE.shard.Reverse $AINSTANCE $AZOOKEEPERS $SHARD_TABLE $DOC2TERM_TABLE $AUSER $APASSWORD
export NUM_TERMS=5
export ITERATION_COUNT=5
bin/accumulo org.apache.accumulo.examples.shard.ContinuousQuery $AINSTANCE $AZOOKEEPERS $SHARD_TABLE $DOC2TERM_TABLE $AUSER $APASSWORD $NUM_TERMS $ITERATION_COUNT
#####################################################################################
#####################################################################################
#####################################################################################
# ---------------------------------------------
# Other programs in client package
# ---------------------------------------------
bin/accumulo $EXAMPLE_PACKAGE.client.Flush $A_OPTIONS $TABLE
# To see all options.
bin/accumulo $EXAMPLE_PACKAGE.client.ReadWriteExample
bin/accumulo $EXAMPLE_PACKAGE.client.ReadWriteExample -i $AINSTANCE -z $AZOOKEEPERS -u $AUSER -p $APASSWORD -t $TABLE -s $COLUMN_VISIBILITY --read
bin/accumulo $EXAMPLE_PACKAGE.client.RowOperations $AOPTIONS
./src/main/java/org/apache/accumulo/examples/constraints/MaxMutationSize.java
./src/main/java/org/apache/accumulo/examples/isolation/InterferenceTest.java

This was a huge help and saved me a ton of time. Thank you.

edit - glad to see that the update shows that it's not necessary to reformat the HDFS each time also.

edit - one small typo... the command to chmod 777 on hadoop_tmp_dir, you forgot the '/' in front of it.

crigano commented Apr 13, 2012

Is it possible to use sun Java JDK 6 and not 7 with accumulo?

This is what I am using with Hadoop under Umbuntu 11.10

thanks Mate!

Chris

I believe Accumulo requires Java JDK 6 at a minimum, so you should be fine.

crigano commented Apr 14, 2012

Thanks Mate! Chris

crigano commented Apr 25, 2012

How would I set up eclipse for interactive debuggging?

thanks Mate! Chris

crigano commented Apr 26, 2012

Thanks! I have a new problem: cant run tar xvzf $TAR_DIR/accumulo-1.5.0-incubating-SNAPSHOT-dist.tar.gz because ~/workspace/accumulo/src/assemble/target is never created.

I execute mvn package in my home directory workspace
I find that =
/workspace/accumulo/src/assemble/target DOES NOT EXIST ONLY
I only have ~/workspace/accumulo/src/site
therefore I cant run tar xvzf $TAR_DIR/accumulo-1.5.0-incubating-SNAPSHOT-dist.tar.gz

Only indication I get is "[INFO] skip non existing resource Directory /home/crigano/workspace/accumulo/server/src/test/resources"

I run git clone https://github.com/apache/accumulo.git
Cloning into accumulo...
remote: Counting objects: 25215, done.
remote: Compressing objects: 100% (7308/7308), done.
remote: Total 25215 (delta 14315), reused 24513 (delta 13621)
Receiving objects: 100% (25215/25215), 8.51 MiB | 332 KiB/s, done.
Resolving deltas: 100% (14315/14315), done.

Then cd accumulo

Then mvn package:
12/04/20 20:13:54 INFO compress.CodecPool: Got brand-new decompressor
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries2 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries3 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries4 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries5 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 5 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 5 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries2
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 5 for tid 2
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries3
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 5 for tid 3
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries4
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 5 for tid 4
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries5
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries2 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries3 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries4 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 12 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 12 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries2
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 12 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries3
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 12 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries4
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries2 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 2 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 2 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries2
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/testlog for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number -1 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/testlog
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/testlog for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/testlog for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number -1 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/testlog
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/testlog for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 3 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/testlog
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries2 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 3 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 3 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries2
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries2 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 3 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 3 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries2
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 4 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries2 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries3 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 3 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 3 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries2
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 3 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries3
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/testlog for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 30 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/testlog
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries2 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 2 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 2 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries2
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries2 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries3 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number -1 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number -1 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries2
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number -1 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries3
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries2 for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 2 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 2 for tid 1
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries2
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 3 for tid 2
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries for table<<
12/04/20 20:13:54 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 3 for tid 2
12/04/20 20:13:54 INFO log.SortedLogRecovery: Recovery complete for workdir/entries
12/04/20 20:13:54 INFO log.SortedLogRecovery: Looking at mutations from workdir/entries for table<<
12/04/20 20:13:55 INFO log.SortedLogRecovery: Scanning for mutations starting at sequence number 3 for tid 2
12/04/20 20:13:55 INFO log.SortedLogRecovery: Recovery complete for workdir/entries
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.982 sec
Running org.apache.accumulo.server.tabletserver.log.MultiReaderTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.124 sec
Running org.apache.accumulo.server.tabletserver.InMemoryMapTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.247 sec
Running org.apache.accumulo.server.constraints.MetadataConstraintsTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.013 sec
Running org.apache.accumulo.server.client.BulkImporterTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.051 sec

Results :

Tests run: 83, Failures: 0, Errors: 0, Skipped: 0

[INFO] [jar:jar {execution: default-jar}]
[INFO] Building jar: /home/crigano/workspace/accumulo/lib/accumulo-server-1.5.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] Building accumulo-examples
[INFO] task-segment: [package]
[INFO] ------------------------------------------------------------------------
[INFO] [enforcer:enforce {execution: enforce-mvn}]
[INFO] [remote-resources:process {execution: default}]
[INFO] [dependency:copy-dependencies {execution: copy-dependencies}]
[INFO] [site:attach-descriptor {execution: default-attach-descriptor}]
[INFO] [site:attach-descriptor {execution: attach-descriptor}]
[INFO] ------------------------------------------------------------------------
[INFO] Building examples-simple
[INFO] task-segment: [package]
[INFO] ------------------------------------------------------------------------
[INFO] [enforcer:enforce {execution: enforce-mvn}]
[INFO] [remote-resources:process {execution: default}]
[debug] execute contextualize
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/crigano/workspace/accumulo/examples/simple/src/main/resources
[INFO] Copying 3 resources
[INFO] [dependency:copy-dependencies {execution: copy-dependencies}]
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Compiling 40 source files to /home/crigano/workspace/accumulo/examples/simple/target/classes
[debug] execute contextualize
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/crigano/workspace/accumulo/examples/simple/src/test/resources
[INFO] Copying 3 resources
[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] Compiling 5 source files to /home/crigano/workspace/accumulo/examples/simple/target/test-classes
[INFO] [surefire:test {execution: default-test}]
[INFO] Surefire report directory: /home/crigano/workspace/accumulo/examples/simple/target/surefire-reports


T E S T S

Running org.apache.accumulo.examples.simple.filedata.ChunkInputFormatTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.169 sec
Running org.apache.accumulo.examples.simple.filedata.ChunkCombinerTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.034 sec
Running org.apache.accumulo.examples.simple.filedata.KeyUtilTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.009 sec
Running org.apache.accumulo.examples.simple.filedata.ChunkInputStreamTest
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.096 sec
Running org.apache.accumulo.examples.simple.dirlist.CountTest
Max depth : 3
Time to find max depth : 2 ms
Time to compute counts : 4 ms
Entries scanned : 30
Counts inserted : 4
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.053 sec

Results :

Tests run: 15, Failures: 0, Errors: 0, Skipped: 0

[INFO] [jar:jar {execution: default-jar}]
[INFO] Building jar: /home/crigano/workspace/accumulo/lib/examples-simple-1.5.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] Building accumulo-wikisearch
[INFO] task-segment: [package]
[INFO] ------------------------------------------------------------------------
[INFO] [enforcer:enforce {execution: enforce-mvn}]
[INFO] [remote-resources:process {execution: default}]
[INFO] [dependency:copy-dependencies {execution: copy-dependencies}]
[INFO] [site:attach-descriptor {execution: default-attach-descriptor}]
[INFO] [site:attach-descriptor {execution: attach-descriptor}]
[INFO] ------------------------------------------------------------------------
[INFO] Building wikisearch-ingest
[INFO] task-segment: [package]
[INFO] ------------------------------------------------------------------------
[INFO] [enforcer:enforce {execution: enforce-mvn}]
[INFO] [remote-resources:process {execution: default}]
[debug] execute contextualize
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/crigano/workspace/accumulo/examples/wikisearch/ingest/src/main/resources
[INFO] Copying 3 resources
[INFO] [dependency:copy-dependencies {execution: copy-dependencies}]
[INFO] Copying hadoop-core-0.20.2.jar to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/lib/hadoop-core-0.20.2.jar
[INFO] Copying commons-codec-1.5.jar to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/lib/commons-codec-1.5.jar
[INFO] Copying commons-lang-2.4.jar to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/lib/commons-lang-2.4.jar
[INFO] Copying lucene-wikipedia-3.0.2.jar to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/lib/lucene-wikipedia-3.0.2.jar
[INFO] Copying google-collections-1.0.jar to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/lib/google-collections-1.0.jar
[INFO] Copying lucene-analyzers-3.0.2.jar to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/lib/lucene-analyzers-3.0.2.jar
[INFO] Copying zookeeper-3.3.1.jar to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/lib/zookeeper-3.3.1.jar
[INFO] Copying cloudtrace-1.5.0-SNAPSHOT.jar to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/lib/cloudtrace-1.5.0-SNAPSHOT.jar
[INFO] Copying lucene-core-3.0.2.jar to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/lib/lucene-core-3.0.2.jar
[INFO] Copying protobuf-java-2.3.0.jar to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/lib/protobuf-java-2.3.0.jar
[INFO] Copying accumulo-core-1.5.0-SNAPSHOT.jar to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/lib/accumulo-core-1.5.0-SNAPSHOT.jar
[INFO] Copying libthrift-0.6.1.jar to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/lib/libthrift-0.6.1.jar
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Compiling 23 source files to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/target/classes
[debug] execute contextualize
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] Compiling 7 source files to /home/crigano/workspace/accumulo/examples/wikisearch/ingest/target/test-classes
[INFO] [surefire:test {execution: default-test}]
[INFO] Surefire report directory: /home/crigano/workspace/accumulo/examples/wikisearch/ingest/target/surefire-reports


T E S T S

Running org.apache.accumulo.examples.wikisearch.iterator.TextIndexTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.982 sec
Running org.apache.accumulo.examples.wikisearch.iterator.GlobalIndexUidTest
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.104 sec
Running org.apache.accumulo.examples.wikisearch.ingest.WikipediaInputSplitTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.037 sec
Running org.apache.accumulo.examples.wikisearch.reader.AggregatingRecordReaderTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.251 sec

Results :

Tests run: 18, Failures: 0, Errors: 0, Skipped: 0

[INFO] [jar:jar {execution: default-jar}]
[INFO] Building jar: /home/crigano/workspace/accumulo/examples/wikisearch/ingest/lib/wikisearch-ingest-1.5.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] Building wikisearch-query
[INFO] task-segment: [package]
[INFO] ------------------------------------------------------------------------
[INFO] [enforcer:enforce {execution: enforce-mvn}]
[INFO] [remote-resources:process {execution: default}]
[debug] execute contextualize
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] Copying 3 resources
[INFO] [dependency:copy-dependencies {execution: copy-dependencies}]
[INFO] Copying commons-codec-1.5.jar to /home/crigano/workspace/accumulo/examples/wikisearch/query/lib/commons-codec-1.5.jar
[INFO] Copying hadoop-core-0.20.2.jar to /home/crigano/workspace/accumulo/examples/wikisearch/query/lib/hadoop-core-0.20.2.jar
[INFO] Copying accumulo-core-1.5.0-SNAPSHOT.jar to /home/crigano/workspace/accumulo/examples/wikisearch/query/lib/accumulo-core-1.5.0-SNAPSHOT.jar
[INFO] Copying libthrift-0.6.1.jar to /home/crigano/workspace/accumulo/examples/wikisearch/query/lib/libthrift-0.6.1.jar
[INFO] Copying commons-lang-2.4.jar to /home/crigano/workspace/accumulo/examples/wikisearch/query/lib/commons-lang-2.4.jar
[INFO] Copying wikisearch-ingest-1.5.0-SNAPSHOT.jar to /home/crigano/workspace/accumulo/examples/wikisearch/query/lib/wikisearch-ingest-1.5.0-SNAPSHOT.jar
[INFO] Copying google-collections-1.0.jar to /home/crigano/workspace/accumulo/examples/wikisearch/query/lib/google-collections-1.0.jar
[INFO] Copying zookeeper-3.3.1.jar to /home/crigano/workspace/accumulo/examples/wikisearch/query/lib/zookeeper-3.3.1.jar
[INFO] Copying kryo-1.04.jar to /home/crigano/workspace/accumulo/examples/wikisearch/query/lib/kryo-1.04.jar
[INFO] Copying commons-jexl-2.0.1.jar to /home/crigano/workspace/accumulo/examples/wikisearch/query/lib/commons-jexl-2.0.1.jar
[INFO] Copying cloudtrace-1.5.0-SNAPSHOT.jar to /home/crigano/workspace/accumulo/examples/wikisearch/query/lib/cloudtrace-1.5.0-SNAPSHOT.jar
[INFO] Copying minlog-1.2.jar to /home/crigano/workspace/accumulo/examples/wikisearch/query/lib/minlog-1.2.jar
[INFO] Copying protobuf-java-2.3.0.jar to /home/crigano/workspace/accumulo/examples/wikisearch/query/lib/protobuf-java-2.3.0.jar
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Compiling 32 source files to /home/crigano/workspace/accumulo/examples/wikisearch/query/target/classes
[debug] execute contextualize
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] Compiling 2 source files to /home/crigano/workspace/accumulo/examples/wikisearch/query/target/test-classes
[INFO] [surefire:test {execution: default-test}]
[INFO] Surefire report directory: /home/crigano/workspace/accumulo/examples/wikisearch/query/target/surefire-reports


T E S T S

Running org.apache.accumulo.examples.wikisearch.logic.TestQueryLogic
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.034 sec

Results :

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

[INFO] [ejb:ejb {execution: default-ejb}]
[INFO] Building ejb wikisearch-query-1.5.0-SNAPSHOT with ejbVersion 3.1
[INFO] Building jar: /home/crigano/workspace/accumulo/examples/wikisearch/query/target/wikisearch-query-1.5.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] Building wikisearch-query-war
[INFO] task-segment: [package]
[INFO] ------------------------------------------------------------------------
[INFO] [enforcer:enforce {execution: enforce-mvn}]
[INFO] [remote-resources:process {execution: default}]
[debug] execute contextualize
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/crigano/workspace/accumulo/examples/wikisearch/query-war/src/main/resources
[INFO] Copying 3 resources
[INFO] [compiler:compile {execution: default-compile}]
[INFO] No sources to compile
[debug] execute contextualize
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] No sources to compile
[INFO] [surefire:test {execution: default-test}]
[INFO] Surefire report directory: /home/crigano/workspace/accumulo/examples/wikisearch/query-war/target/surefire-reports


T E S T S

Results :

Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

[INFO] [war:war {execution: default-war}]
[INFO] Packaging webapp
[INFO] Assembling webapp [wikisearch-query-war] in [/home/crigano/workspace/accumulo/examples/wikisearch/query-war/target/wikisearch-query-war-1.5.0-SNAPSHOT]
[INFO] Processing war project
[INFO] Copying webapp resources [/home/crigano/workspace/accumulo/examples/wikisearch/query-war/src/main/webapp]
[INFO] Webapp assembled in [75 msecs]
[INFO] Building war: /home/crigano/workspace/accumulo/examples/wikisearch/query-war/target/wikisearch-query-war-1.5.0-SNAPSHOT.war
[INFO] WEB-INF/web.xml already added, skipping
[INFO] ------------------------------------------------------------------------
[INFO] Building accumulo-assemble
[INFO] task-segment: [package]
[INFO] ------------------------------------------------------------------------
[INFO] [enforcer:enforce {execution: enforce-mvn}]
[INFO] [remote-resources:process {execution: default}]
[INFO] [dependency:copy-dependencies {execution: copy-dependencies}]
[INFO] [exec:exec {execution: user-manual}]
Missing pdflatex command. Please install.
[INFO] [site:attach-descriptor {execution: default-attach-descriptor}]
[INFO] [exec:exec {execution: config webpage}]
[INFO] [site:attach-descriptor {execution: attach-descriptor}]
[INFO]
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] ------------------------------------------------------------------------
[INFO] accumulo .............................................. SUCCESS [6.799s]
[INFO] cloudtrace ............................................ SUCCESS [7.879s]
[INFO] accumulo-start ........................................ SUCCESS [21.564s]
[INFO] accumulo-core ......................................... SUCCESS [1:23.868s]
[INFO] accumulo-server ....................................... SUCCESS [1:21.495s]
[INFO] accumulo-examples ..................................... SUCCESS [0.079s]
[INFO] examples-simple ....................................... SUCCESS [36.891s]
[INFO] accumulo-wikisearch ................................... SUCCESS [0.060s]
[INFO] wikisearch-ingest ..................................... SUCCESS [18.040s]
[INFO] wikisearch-query ...................................... SUCCESS [19.979s]
[INFO] wikisearch-query-war .................................. SUCCESS [1.350s]
[INFO] accumulo-assemble ..................................... SUCCESS [0.677s]
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4 minutes 40 seconds
[INFO] Finished at: Fri Apr 20 20:15:14 EDT 2012
[INFO] Final Memory: 115M/241M
[INFO] ------------------------------------------------------------------------
crigano@crigano-VirtualBox:~/workspace/accumulo$

Thanks Mate! Chris

I'm a little confused by your question, the above set of commands is for building Accumulo, not installing it through apt-get, which is what you're looking for I believe.

My team had some issues when it came down to utilizing the SNAPSHOT on line 201. The file was not there. I found the accumulo dev guide on apache says to build it using mvn package -P assemble. This worked to create the artifact, but it was in workspace/accumulo/assemble/target and not workspace/accumulo/src/assemble/target. Other than that, awesome guide. Thanks!

tjsears commented May 11, 2012

Having trouble with lines 200 & 201. In line 200, the directory /workspace/accumulo/src/assemble/target does not exist. In line 201, accumulo-1.5.0-incubating-SNAPSHOT-dist.tar.gz does not exist. Rather, accumulo-1.5.0-SNAPSHOT-dist.tar.gz.

tjsears commented May 14, 2012

Getting a weird error:

Starting tablet servers and loggers .... done
localhost : tablet server already running (2961)
Starting logger on localhost
14 13:53:52,408 [server.Accumulo] INFO : Attempting to talk to zookeeper
14 13:53:52,664 [server.Accumulo] INFO : Zookeeper connected and initialized, attemping to talk to HDFS
14 13:53:52,666 [server.Accumulo] INFO : Connected to HDFS
Thread "org.apache.accumulo.server.master.state.SetGoalState" died null
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.accumulo.start.Main$1.run(Main.java:89)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /accumulo/34d09d85-40ee-433d-a705-404cb5e24bb9/masters/goal_state
at org.apache.zookeeper.KeeperException.create(KeeperException.java:104)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:643)
at org.apache.accumulo.core.zookeeper.ZooUtil.putData(ZooUtil.java:146)
at org.apache.accumulo.core.zookeeper.ZooUtil.putPersistentData(ZooUtil.java:126)
at org.apache.accumulo.server.zookeeper.ZooReaderWriter.putPersistentData(ZooReaderWriter.java:82)
at org.apache.accumulo.server.master.state.SetGoalState.main(SetGoalState.java:46)
... 6 more
localhost : master already running (3139)
localhost : garbage collector already running (3225)
localhost : monitor already running (3313)
localhost : tracer already running (3404)

Any ideas?

tjsears commented May 14, 2012

Here's is the tserver_ubuntu_debug:

14 10:53:13,199 [server.Accumulo] INFO : tserver.logger.timeout = 30s
14 10:53:13,199 [server.Accumulo] INFO : tserver.memory.lock = false
14 10:53:13,200 [server.Accumulo] INFO : tserver.memory.manager = org.apache.accumulo.server.tabletserver.LargestFirstMemoryManager
14 10:53:13,200 [server.Accumulo] INFO : tserver.memory.maps.max = 512M
14 10:53:13,200 [server.Accumulo] INFO : tserver.memory.maps.native.enabled = true
14 10:53:13,200 [server.Accumulo] INFO : tserver.metadata.readahead.concurrent.max = 8
14 10:53:13,200 [server.Accumulo] INFO : tserver.migrations.concurrent.max = 1
14 10:53:13,200 [server.Accumulo] INFO : tserver.monitor.fs = true
14 10:53:13,200 [server.Accumulo] INFO : tserver.mutation.queue.max = 256K
14 10:53:13,201 [server.Accumulo] INFO : tserver.port.client = 9997
14 10:53:13,201 [server.Accumulo] INFO : tserver.port.search = false
14 10:53:13,201 [server.Accumulo] INFO : tserver.readahead.concurrent.max = 16
14 10:53:13,203 [server.Accumulo] INFO : tserver.scan.files.open.max = 100
14 10:53:13,204 [server.Accumulo] INFO : tserver.server.threadcheck.time = 1s
14 10:53:13,204 [server.Accumulo] INFO : tserver.server.threads.minimum = 2
14 10:53:13,204 [server.Accumulo] INFO : tserver.session.idle.max = 1m
14 10:53:13,205 [server.Accumulo] INFO : tserver.tablet.split.midpoint.files.max = 30
14 10:53:13,205 [server.Accumulo] INFO : tserver.walog.max.size = 256M
14 10:53:13,861 [tabletserver.TabletServer] INFO : Tablet server starting on localhost
14 10:53:13,998 [util.FileSystemMonitor] INFO : Filesystem monitor started
14 10:53:14,121 [tabletserver.NativeMap] ERROR: Failed to load native map library /home/ubuntu/accumulo-1.5.0-SNAPSHOT/lib/native/map
/libNativeMap-Linux-i386-32.so
java.lang.UnsatisfiedLinkError: Can't load library: /home/ubuntu/accumulo-1.5.0-SNAPSHOT/lib/native/map/libNativeMap-Linux-i386-32.so
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
at java.lang.Runtime.load0(Runtime.java:792)
at java.lang.System.load(System.java:1059)
at org.apache.accumulo.server.tabletserver.NativeMap.loadNativeLib(NativeMap.java:144)
at org.apache.accumulo.server.tabletserver.NativeMap.(NativeMap.java:156)
at org.apache.accumulo.server.tabletserver.TabletServerResourceManager.(TabletServerResourceManager.java:148)
at org.apache.accumulo.server.tabletserver.TabletServer.config(TabletServer.java:2968)
at org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.java:3103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.accumulo.start.Main$1.run(Main.java:89)
at java.lang.Thread.run(Thread.java:722)
14 10:53:14,216 [server.Accumulo] WARN : System swappiness setting is greater than zero (60) which can cause time-sensitive operation
s to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement.
14 10:53:14,251 [tabletserver.TabletServer] ERROR: Uncaught exception in TabletServer.main, exiting
java.lang.IllegalArgumentException: Maximum tablet server map memory 536,870,912 and block cache sizes 149,946,368 is too large for t
his JVM configuration 530,186,240
at org.apache.accumulo.server.tabletserver.TabletServerResourceManager.(TabletServerResourceManager.java:159)
at org.apache.accumulo.server.tabletserver.TabletServer.config(TabletServer.java:2968)
at org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.java:3103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.accumulo.start.Main$1.run(Main.java:89)
at java.lang.Thread.run(Thread.java:722)
14 10:53:24,215 [server.Accumulo] WARN : System swappiness setting is greater than zero (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement.
14 10:53:34,216 [server.Accumulo] WARN : System swappiness setting is greater than zero (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement.
14 10:53:44,220 [server.Accumulo] WARN : System swappiness setting is greater than zero (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement.
14 10:53:54,226 [server.Accumulo] WARN : System swappiness setting is greater than zero (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement.
14 10:54:04,226 [server.Accumulo] WARN : System swappiness setting is greater than zero (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement.
14 10:54:14,226 [server.Accumulo] WARN : System swappiness setting is greater than zero (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement.
14 10:54:24,228 [server.Accumulo] WARN : System swappiness setting is greater than zero (60) which can cause time-sensitive operations to be delayed. Accumulo is time sensitive because it needs to maintain distributed lock agreement.

Tim,
How much memory do you have allocated to your virtual machine? From your error:

Maximum tablet server map memory 536,870,912 and block cache sizes 149,946,368 is too large for t
his JVM configuration 530,186,240

It looks like you are using the 512MB example configuration with not enough memory allocated from your JVM. Can you increase your memory?

Hope that helps.

tjsears commented May 15, 2012

Memory is 2GB for the virtual environment.

telvis07 commented Jun 1, 2012

two thumbs up!

I downloaded Accumulo VM from this URL: http://blog.sqrrl.com/post/40578606670/quick-accumulo-install

I bring it up in Virtualbox it asks me for ubuntu username and password.

Anybody know what the username and password is?

Thanks

beedaan commented Jul 30, 2013

a quick note, I had to use Maven 3 instead of Maven 2

peppeU commented Jun 20, 2017

Hi Tim i have the same problem...
How did you solve it?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment