This a summary of test run to see if the drop behind settings make a noticable difference for Accumulo compactions. No differences were seen. A test with C code was run and differences were seen. One difference between the C and Accumulo code is the C code is only reading data. Further investigation is needed. Not sure if there is a bug in Hadoop/Accumulo or if there was a problem with the test.
These test were run using this commint from this branch which is a modified verions of #3083
To generate data for Accumulo to compact the following accumulo-testing command was run. Test were conducted on a laptop with 16G of RAM and a single DN and tserver setup by Uno.
$ ./bin/cingest createtable
$ ./bin/cingest ingest -o test.ci.ingest.client.entries=20000000 -o test.ci.ingest.delete.probability=0.0
Below is an experiment of compacting w/o drop behind and looking and meminfo before and after.
$ accumulo shell -e "config -t ci -s table.compaction.major.input.drop.cache=false"
$ accumulo shell -e "config -t ci -s table.compaction.major.output.drop.cache=false"
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
3
$ cat /proc/meminfo | egrep '\(file\)|Cached'
Cached: 2281948 kB
SwapCached: 138420 kB
Active(file): 155296 kB
Inactive(file): 227312 kB
$ accumulo shell -e "du -t ci"
782,260,524 [ci]
$ accumulo shell -e "compact -t ci -w"
2022-11-25T17:02:36,338 [shell.Shell] INFO : Compacting table ...
2022-11-25T17:02:56,860 [shell.Shell] INFO : Compaction of table ci completed for given range
$ cat /proc/meminfo | egrep '\(file\)|Cached'
Cached: 3243120 kB
SwapCached: 142680 kB
Active(file): 266360 kB
Inactive(file): 1088056 kB
Below is the same experimentas above except w/ drop behind. The drop behind settings did not seem to clear cache.
$ accumulo shell -e "config -t ci -s table.compaction.major.output.drop.cache=true"
$ accumulo shell -e "config -t ci -s table.compaction.major.input.drop.cache=true"
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
3
$ cat /proc/meminfo | egrep '\(file\)|Cached'
Cached: 2270704 kB
SwapCached: 142684 kB
Active(file): 226084 kB
Inactive(file): 146692 kB
$ accumulo shell -e "compact -t ci -w"
2022-11-25T17:04:44,771 [shell.Shell] INFO : Compacting table ...
2022-11-25T17:05:05,809 [shell.Shell] INFO : Compaction of table ci completed for given range
$ cat /proc/meminfo | egrep '\(file\)|Cached'
Cached: 3197684 kB
SwapCached: 145116 kB
Active(file): 286048 kB
Inactive(file): 1034448 kB
Looking in the tserver logs there is evidence that the hadoop drop behind calls are being made during the compaction above.
2022-11-25T17:04:45,208 [rfile.RFileOperations] DEBUG: Called setDropBehind(TRUE) for stream writing file hdfs://localhost:8020/accumulo/tables/1/t-0000000/A00000op.rf_tmp
2022-11-25T17:04:45,209 [impl.CachableBlockFile] DEBUG: Called setDropBehind(TRUE) for stream reading file hdfs://localhost:8020/accumulo/tables/1/t-0000000/A00000o5.rf
2022-11-25T17:04:45,210 [rfile.RFileOperations] DEBUG: Called setDropBehind(TRUE) for stream writing file hdfs://localhost:8020/accumulo/tables/1/t-0000003/A00000os.rf_tmp
2022-11-25T17:04:45,210 [rfile.RFileOperations] DEBUG: Called setDropBehind(TRUE) for stream writing file hdfs://localhost:8020/accumulo/tables/1/t-0000001/A00000oq.rf_tmp
.
.
.
2022-11-25T17:05:01,110 [rfile.RFileOperations] DEBUG: Called setDropBehind(TRUE) for stream writing file hdfs://localhost:8020/accumulo/tables/1/t-0000006/A00000p7.rf_tmp
2022-11-25T17:05:01,111 [impl.CachableBlockFile] DEBUG: Called setDropBehind(TRUE) for stream reading file hdfs://localhost:8020/accumulo/tables/1/t-0000006/A00000on.rf
2022-11-25T17:05:01,226 [rfile.RFileOperations] DEBUG: Called setDropBehind(TRUE) for stream writing file hdfs://localhost:8020/accumulo/tables/1/t-0000005/A00000p8.rf_tmp
2022-11-25T17:05:01,227 [impl.CachableBlockFile] DEBUG: Called setDropBehind(TRUE) for stream reading file hdfs://localhost:8020/accumulo/tables/1/t-0000005/A00000oo.rf
I tried using strace on the datanode also (wanted to see the DN system calls to fadvise). Not sure what to make of that though, need to try that again.
After not seeing anything in the Accumulo experiments. I created a simple C program to see if I could see a difference in /proc/meminfo
#include <stdio.h>
#include <fcntl.h>
#include <stdint.h>
#include <inttypes.h>
int main(int argc, char *argv[])
{
if(argc != 3) {
return -1;
}
char b[4096*16];
FILE *fp = fopen(argv[1],"rb");
int fd = fileno(fp);
if(!fp){
return -1;
}
int64_t totalRead = 0;
size_t ret_code = fread(b, sizeof *b, 4096*16, fp);
totalRead +=ret_code;
while(ret_code > 0) {
//printf("read %d\n", (int)ret_code);
ret_code = fread(b, sizeof *b, 4096*16, fp);
totalRead+=ret_code;
}
if(argv[2][0] == 'y'){
printf("calling fadvise\n");
posix_fadvise(fd,0,0,POSIX_FADV_DONTNEED);
}
printf("read %" PRId64 "\n", totalRead);
fclose(fp);
}
Below is the output of running the program w/ and w/o fadvise. Do see a difference in cache in /proc/meminfo here.
$ head -c 1000000000 /dev/urandom > test.bin
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
3
$ cat /proc/meminfo | egrep '\(file\)|Cached'
Cached: 2118404 kB
SwapCached: 138160 kB
Active(file): 108880 kB
Inactive(file): 103240 kB
$ gcc testfp.c -o testfp
$ ./testfp test.bin n
read 1000000000
$ cat /proc/meminfo | egrep '\(file\)|Cached'
Cached: 2947948 kB
SwapCached: 138160 kB
Active(file): 147472 kB
Inactive(file): 903356 kB
$ echo 3 | sudo tee /proc/sys/vm/drop_caches
3
$ ./testfp test.bin y
calling fadvise
read 1000000000
$ cat /proc/meminfo | egrep '\(file\)|Cached'
Cached: 2133972 kB
SwapCached: 138160 kB
Active(file): 125160 kB
Inactive(file): 101976 kB
$
There is discussion around this in Accumulo slack w/ useful info and context. https://the-asf.slack.com/archives/CERNB8NDC/p1669399302264189