Keith Turner keith-turner

## experiments.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                keith-turner
                / experiments.md
            
            
              Last active
              September 22, 2023 01:17
            
              
                3761 Experiments
              
          
    Experiments with the changes from #3761
Below is Accumulo shell output
root@uno> createtable foo
root@uno foo> insert 1 f q 1
root@uno foo> insert 2 f q 2
root@uno foo> insert 3 f q 3
root@uno foo> insert 4 f q 4


## AccumuloOfflineScanTest.md

      
              3 files
            
          
              0 forks
            
          
              2 comments
            
          
              0 stars
            
          
                keith-turner
                / AccumuloOfflineScanTest.md
            
            
              Last active
              December 30, 2022 20:39
            
              
                Scan server test programs
              
          
    Testing Accumulo offline scans

Wrote some test programs to excercise offline scans in Accumulo.  These programs are expected to run in seperate processes, this is to prevent table operations from clearing the client side tablet cache used by scans.
# start some scan servers
accumulo WriteRead accumulo-client.properties &> writeread.log &
accumulo ModifyTable accumulo-client.properties &> modifytable.log &

  
## experiment1.md

      
              3 files
            
          
              0 forks
            
          
              1 comment
            
          
              0 stars
            
          
                keith-turner
                / experiment1.md
            
            
              Last active
              December 7, 2022 12:15
            
              
                Accumulo compaction drop behind experiment
              
          
    Accumulo compaction drop behind experiment

This a summary of test run to see if the drop behind settings make a noticable difference for Accumulo compactions.  No differences were seen.  A test with C code was run and differences were seen. One difference between the C and Accumulo code is the C code is only reading data.  Further investigation is needed.  Not sure if there is a bug in Hadoop/Accumulo or if there was a problem with the test.
Setup

These test were run using this commint from this branch which is a modified verions of #3083
To generate data for Accumulo to compact the following accumulo-testing command was run.  Test were conducted on a laptop with 16G of RAM and a single DN and tserver setup by Uno.

  
## Ecoji2-proposal-analysis.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                keith-turner
                / Ecoji2-proposal-analysis.md
            
            
              Last active
              June 21, 2021 01:09
            
              
                Ecoji 2 proposal analysis
              
          
    This document represents an analysis of the Emojis proposed in ecoji#29.


Column
Description


Code Point


Emoji


Candidate
True if the emoji exists in emoji-test.txt and is a single code point when fully qualified.


v1 ord
The 10-bit code that Ecoji V1 assigns to this emoji.  Its -1 when Ecoji V1 does not use the emoji.


v2 ord
The 10-bit code that Ecoji V2 assigns to this emoji   Its -1 when Ecoji V2 does not use the emoji.


## compaction_comp.md

      
              5 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                keith-turner
                / compaction_comp.md
            
            
              Last active
              June 1, 2020 18:07
            
              
                Test of new Accumulo compaction code
              
          
    Introduction

Accumulo users sometimes filter or transform data via compactions.  In current releases of Accumulo, these user initiated compactions can be disruptive to data currently being written.  To improve this situation, PR #1605 was created for the next release of Accumulo.  This PR enables dedicating resources to user initiated compactions. To verify if the PR is effective test with heavy ingest and concurrent user compactions were run. These test were run on two Azure clusters.  One cluster had a version of Accumulo containing the changes in #1605.  The other cluster had Accumulo 2.0.0.  This document describes the test and the outcomes and show that the changes in #1605 were beneficial in this scenario.
Terminology


Tablet : Each Accumulo table is divided into tablets.  Each tablet has a list of files in DFS where it stores the data in its range.
Minor compaction : When data is written to an Accumulo tablet its buffered into memory.


## usecases.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                keith-turner
                / usecases.md
            
            
              Last active
              May 14, 2020 03:50
            
              
                Accumulo Compaction Use Cases
              
          
    This document is a work in progress and goes with #1605
Different compression algorithms

Users can get better throughput without sacrificing storage space by using snappy for small compactions and gzip for large compactions.  This can be achieved by configuring the CompactionConfigurer implementation CompressionConfigurer for a table.  After configured this would be used for all compactions, unless a user initiated compaction specified a CompactionConfigurer.
Selectively filtering data

For many reasons users may wish to filter data from an Accumulo table. One example use case would be that unwanted data was erroneously written to a table.

  
## runIT.sh
#!/bin/bash

mvn -Dit.test="$1*" -Dtest=94w5up8qtweh -PskipQA -DskipITs=false -DskipTests=false -DfailIfNoTests=false verify

## compaction-algorithm.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                keith-turner
                / compaction-algorithm.md
            
            
              Last active
              December 11, 2019 23:21
            
              
                A Proposed Modification to Accumulo's Compaction Algorithm
              
          
    A Proposed Modification to Accumulo's Compaction Algorithm

By default, [compactions][1] in Accumulo are driven by a configurable compaction ratio using the following algorithm.

If LF * CR < SUM then compact this set of files.  LF is the largest file in the set.  CR is the compaction ratio.  SUM is the size of all files in the set.
Remove largest file from set.
If set is empty, then compact no files.
Go to 1.


## CBI.java
package cmd;

import java.net.URI;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;

import org.apache.accumulo.core.client.Connector;
import org.apache.accumulo.core.client.rfile.RFile;

## accumulo-s3-notes.md

      
              5 files
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                keith-turner
                / accumulo-s3-notes.md
            
            
              Last active
              August 20, 2019 09:37
            
              
                Notes from testing Accumulo 2.0.0-alpha-2 with S3.
              
          
    These are notes from testing Accumulo 2.0.0-alpha-2 on S3.  Accumulo was setup following these instructions. Used 10 m5d.2xlarge workers and one m5d.2xlarge master.  Used HDFS running on clusters ephemeral storage for write ahead logs and metadata table files. Used two tier compaction strategy snappy for small files <100M and gz for larger files.
Ran continuous ingest for ~24hr.  During this time 74 billion key values were ingested.  I adjusted compaction settings twoards the end of the test and the ingest speed jumped.  Opened #930 about this issue, need to describe the issue better.
After stopping ingest there were around 5120 tablets each with about 14 files per tablet.  I tried running some queries at this time and it seems like a lookup took 3 to 4 seconds.
I let the cluster compact all the tablets down. It settled around 4 files per tablet and stopped compacting. I started doing
Column	Description
Code Point
Emoji
Candidate	True if the emoji exists in emoji-test.txt and is a single code point when fully qualified.
v1 ord	The 10-bit code that Ecoji V1 assigns to this emoji. Its -1 when Ecoji V1 does not use the emoji.
v2 ord	The 10-bit code that Ecoji V2 assigns to this emoji Its -1 when Ecoji V2 does not use the emoji.
	#!/bin/bash

	mvn -Dit.test="$1*" -Dtest=94w5up8qtweh -PskipQA -DskipITs=false -DskipTests=false -DfailIfNoTests=false verify
	package cmd;

	import java.net.URI;
	import java.util.ArrayList;
	import java.util.Collections;
	import java.util.List;
	import java.util.Random;

	import org.apache.accumulo.core.client.Connector;
	import org.apache.accumulo.core.client.rfile.RFile;