Skip to content

Instantly share code, notes, and snippets.

View keith-turner's full-sized avatar
👍
17.9 % chance that I am coding

Keith Turner keith-turner

👍
17.9 % chance that I am coding
View GitHub Profile
@keith-turner
keith-turner / instructions.md
Last active July 13, 2020 09:02
Notes from running Accumulo 2.0.0-alpha-2 on S3

WIP Instructions for running Accumulo 2.0.0-alpha-2 in S3

Newer instructions for Accumulo 2.0.0 are available

Running on S3 requires a new feature in Accumulo 2.0. Accumulo has a pluggable volume chooser that tells Accumulo which URL a new file should be placed on. In 2.0 this volume chooser was updated to be aware of write ahead logs. Before 2.0 when the volume chooser was asked where it should put a file, it did not know if the request was for a write ahead log. In 2.0 it knows this, which allows write ahead logs to be placed on HDFS and table files on S3. This is important because S3 does not support the needs of write ahead logs.

First set S3A settings in core-site.xml.

<property>
@keith-turner
keith-turner / runIT.sh
Last active March 28, 2018 14:05
Script to run Accumulo integration test
#!/bin/bash
mvn -Dit.test="$1" -Dtest=foo -Dfindbugs.skip -Dcheckstyle.skip -Drat.skip -Daccumulo.skip -Dapilyzer.skip -Dmdep.analyze.skip -Dmodernizer.skip -Dimpsort.skip -Dformatter.skip verify
@keith-turner
keith-turner / env.sh
Last active January 10, 2018 15:29
Settings used for Fluo stress test run for 1.2.0 release.
###############################
# configuration for all scripts
###############################
# Fluo Home
test -z "$FLUO_HOME" && FLUO_HOME=/path/to/accumulo
# Fluo application name
FLUO_APP_NAME=stresso
###############################
# configuration for run-test.sh
@keith-turner
keith-turner / AsyncCommit.java
Last active December 13, 2017 17:07
Experiment related to apache/fluo#978. This very incomplete and may be a terrible design. This is the result of an initial experiment just to think about the high level structure.
private ExecutorService syncExecService = null;
// writes the conditional mutation that attempts to lock the primary column.
CompletableFuture<Boolean> lockPrimary() {
Flutation lockPrimaryCondMut = null;
CompletableFuture<Status> future =
asyncCW.write(lockPrimaryCondMut).thenApply(Result::getStatus);
return future.thenCompose(status -> {
if (status == Status.UNKNOWN) {
package test.rfile;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.Map.Entry;
import java.util.Random;
import java.util.function.Function;
@keith-turner
keith-turner / LocGroupPerfTest.java
Last active July 21, 2017 18:36
Performance experiment for apache/accumulo#275 and ACCUMULO-4667
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.util.LinkedHashSet;
import java.util.Map.Entry;
import java.util.Set;
import org.apache.accumulo.core.client.Scanner;
import org.apache.accumulo.core.client.rfile.RFile;
import org.apache.accumulo.core.client.rfile.RFileWriter;
@keith-turner
keith-turner / ExternalIndex.java
Last active November 30, 2016 20:35
Modify http://fluo.apache.org/tour/exercise-1/ part 3 to create inverted index in external table.
package ft;
import java.util.Optional;
import java.util.function.Consumer;
import org.apache.accumulo.core.client.Connector;
import org.apache.accumulo.core.client.ZooKeeperInstance;
import org.apache.accumulo.core.client.security.tokens.PasswordToken;
import org.apache.accumulo.core.data.Mutation;
import org.apache.fluo.api.client.TransactionBase;
@keith-turner
keith-turner / ContentObserver.java
Last active June 27, 2017 20:47
A solution to excercise 1 of the Fluo Tour http://fluo.apache.org/tour/exercise-1/
package ft;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
import com.google.common.collect.Collections2;
if [[ -z $HADOOP_HOME ]] ; then
test -z "$HADOOP_PREFIX" && export HADOOP_PREFIX=/home/fluo/git/fluo-dev/install/hadoop-2.7.2
else
HADOOP_PREFIX="$HADOOP_HOME"
unset HADOOP_HOME
fi
# hadoop-2.0:
test -z "$HADOOP_CONF_DIR" && export HADOOP_CONF_DIR="$HADOOP_PREFIX/etc/hadoop"

Basic Sampling Example

Accumulo supports building a set of sample data that can be efficiently accessed by scanners. What data is included in the sample set is configurable. Below, some data representing documents are inserted.

root@instance sampex> createtable sampex
root@instance sampex> insert 9255 doc content 'abcde'

root@instance sampex> insert 9255 doc url file://foo.txt