Experiments with the changes from #3761
Below is Accumulo shell output
root@uno> createtable foo
root@uno foo> insert 1 f q 1
root@uno foo> insert 2 f q 2
root@uno foo> insert 3 f q 3
root@uno foo> insert 4 f q 4
Experiments with the changes from #3761
Below is Accumulo shell output
root@uno> createtable foo
root@uno foo> insert 1 f q 1
root@uno foo> insert 2 f q 2
root@uno foo> insert 3 f q 3
root@uno foo> insert 4 f q 4
Wrote some test programs to excercise offline scans in Accumulo. These programs are expected to run in seperate processes, this is to prevent table operations from clearing the client side tablet cache used by scans.
# start some scan servers
accumulo WriteRead accumulo-client.properties &> writeread.log &
accumulo ModifyTable accumulo-client.properties &> modifytable.log &
This a summary of test run to see if the drop behind settings make a noticable difference for Accumulo compactions. No differences were seen. A test with C code was run and differences were seen. One difference between the C and Accumulo code is the C code is only reading data. Further investigation is needed. Not sure if there is a bug in Hadoop/Accumulo or if there was a problem with the test.
These test were run using this commint from this branch which is a modified verions of #3083
To generate data for Accumulo to compact the following accumulo-testing command was run. Test were conducted on a laptop with 16G of RAM and a single DN and tserver setup by Uno.
This document represents an analysis of the Emojis proposed in ecoji#29.
Column | Description |
---|---|
Code Point | |
Emoji | |
Candidate | True if the emoji exists in emoji-test.txt and is a single code point when fully qualified. |
v1 ord | The 10-bit code that Ecoji V1 assigns to this emoji. Its -1 when Ecoji V1 does not use the emoji. |
v2 ord | The 10-bit code that Ecoji V2 assigns to this emoji Its -1 when Ecoji V2 does not use the emoji. |
Accumulo users sometimes filter or transform data via compactions. In current releases of Accumulo, these user initiated compactions can be disruptive to data currently being written. To improve this situation, PR #1605 was created for the next release of Accumulo. This PR enables dedicating resources to user initiated compactions. To verify if the PR is effective test with heavy ingest and concurrent user compactions were run. These test were run on two Azure clusters. One cluster had a version of Accumulo containing the changes in #1605. The other cluster had Accumulo 2.0.0. This document describes the test and the outcomes and show that the changes in #1605 were beneficial in this scenario.
This document is a work in progress and goes with #1605
Users can get better throughput without sacrificing storage space by using snappy for small compactions and gzip for large compactions. This can be achieved by configuring the CompactionConfigurer implementation CompressionConfigurer for a table. After configured this would be used for all compactions, unless a user initiated compaction specified a CompactionConfigurer.
For many reasons users may wish to filter data from an Accumulo table. One example use case would be that unwanted data was erroneously written to a table.
#!/bin/bash | |
mvn -Dit.test="$1*" -Dtest=94w5up8qtweh -PskipQA -DskipITs=false -DskipTests=false -DfailIfNoTests=false verify |
By default, [compactions][1] in Accumulo are driven by a configurable compaction ratio using the following algorithm.
package cmd; | |
import java.net.URI; | |
import java.util.ArrayList; | |
import java.util.Collections; | |
import java.util.List; | |
import java.util.Random; | |
import org.apache.accumulo.core.client.Connector; | |
import org.apache.accumulo.core.client.rfile.RFile; |
These are notes from testing Accumulo 2.0.0-alpha-2 on S3. Accumulo was setup following these instructions. Used 10 m5d.2xlarge workers and one m5d.2xlarge master. Used HDFS running on clusters ephemeral storage for write ahead logs and metadata table files. Used two tier compaction strategy snappy for small files <100M and gz for larger files.
Ran continuous ingest for ~24hr. During this time 74 billion key values were ingested. I adjusted compaction settings twoards the end of the test and the ingest speed jumped. Opened #930 about this issue, need to describe the issue better.
After stopping ingest there were around 5120 tablets each with about 14 files per tablet. I tried running some queries at this time and it seems like a lookup took 3 to 4 seconds.
I let the cluster compact all the tablets down. It settled around 4 files per tablet and stopped compacting. I started doing