Instantly share code, notes, and snippets.

Embed
What would you like to do?
strsplit valgrind warnings

Starts off with:

On Sunday, March 25, 2018, 3:03:26 AM EDT, Prof Brian Ripley ripley@stats.ox.ac.uk wrote: See https://cran.r-project.org/web/checks/check_results_fansi.html . valgrind is reporting the use of uninitialized strings.

Please correct ASAP and before April 20 to safely retain the package on CRAN.

We look at the fansi valgrind test run, and see errors like the following:

==8932== Conditional jump or move depends on uninitialised value(s)
==8932==    at 0x403EC65: ???
==8932==    by 0x11D647DE: ???
==8932==    by 0x11D647DE: ???
==8932==    by 0x11D6481C: ???
==8932==    by 0x1FFEFEE65F: ???
==8932==  Uninitialised value was created by a client request
==8932==    at 0x511AE6: Rf_allocVector3 (svn/R-devel/src/main/memory.c:2792)
==8932==    by 0x4B73D0: Rf_mkCharLenCE (svn/R-devel/src/main/envir.c:3913)
==8932==    by 0x4E72C3: mkString2 (svn/R-devel/src/main/gram.c:3178)
==8932==    by 0x4E72C3: StringValue (svn/R-devel/src/main/gram.c:4802)
==8932==    by 0x4EBD49: token (svn/R-devel/src/main/gram.c:5010)
==8932==    by 0x4EBD49: token_ (svn/R-devel/src/main/gram.c:5205)
==8932==    by 0x4ECEAC: yylex (svn/R-devel/src/main/gram.c:5231)
==8932==    by 0x4ECEAC: Rf_yyparse (svn/R-devel/src/main/gram.c:1850)
==8932==    by 0x4EEE08: R_Parse1 (svn/R-devel/src/main/gram.c:3553)
==8932==    by 0x4EF464: R_Parse (svn/R-devel/src/main/gram.c:3683)
==8932==    by 0x572711: do_parse (svn/R-devel/src/main/source.c:278)
==8932==    by 0x4CC001: bcEval (svn/R-devel/src/main/eval.c:6771)
==8932==    by 0x4DA52F: Rf_eval (svn/R-devel/src/main/eval.c:624)
==8932==    by 0x4DBE1E: R_execClosure (svn/R-devel/src/main/eval.c:1764)
==8932==    by 0x4D15EA: bcEval (svn/R-devel/src/main/eval.c:6739)

We assume these have nothing to do with our code as they are emitted during the pre-load portion and involve entry points that are all part of the R sources. Most of the valgrind errors are like this one.

There is one similar warning that does involve a FANSI entry point, about 2/3rds of the way through the log:

==8932== Conditional jump or move depends on uninitialised value(s)
==8932==    at 0x403EBFB: ???
==8932==    by 0x10D25C25: ???
==8932==    by 0x10D25C25: ???
==8932==    by 0x10D25DC7: ???
==8932==    by 0x1FFEFE6A5F: ???
==8932==  Uninitialised value was created by a heap allocation
==8932==    at 0x4C2DB6B: malloc (/builddir/build/BUILD/valgrind-3.13.0/coregrind/m_replacemalloc/vg_replace_malloc.c:299)
==8932==    by 0x511B31: Rf_allocVector3 (svn/R-devel/src/main/memory.c:2712)
==8932==    by 0x4B73D0: Rf_mkCharLenCE (svn/R-devel/src/main/envir.c:3913)
==8932==    by 0x1D6B5BE8: FANSI_strip (packages/tests-vg/fansi/src/strip.c:170)
==8932==    by 0x4D0C99: bcEval (svn/R-devel/src/main/eval.c:7278)
==8932==    by 0x4DA52F: Rf_eval (svn/R-devel/src/main/eval.c:624)
==8932==    by 0x4DBE1E: R_execClosure (svn/R-devel/src/main/eval.c:1764)
==8932==    by 0x4DA6BC: Rf_eval (svn/R-devel/src/main/eval.c:747)
==8932==    by 0x4DAD00: forcePromise (svn/R-devel/src/main/eval.c:520)
==8932==    by 0x4DB127: FORCE_PROMISE (svn/R-devel/src/main/eval.c:4966)
==8932==    by 0x4DB127: getvar (svn/R-devel/src/main/eval.c:5008)
==8932==    by 0x4D07A1: bcEval (svn/R-devel/src/main/eval.c:6503)
==8932==    by 0x4DA52F: Rf_eval (svn/R-devel/src/main/eval.c:624)
==8932== 

We assume this is the one BDR is concerned with. However, it does not appear to be fansi's fault. Using WCH's r-debug docker, we can recreate the problem with:

docker pull wch1/r-debug
docker run --rm -ti --security-opt seccomp=unconfined wch1/r-debug
RDvalgrind -e "install.packages(c('fansi', 'inline'))"
RDvalgrind -d "valgrind --track-origins=yes"

x <- paste0(c("\033[0m", rep("Z", 129)), collapse="")
> strsplit(fansi::strip_ctl(x), " ", perl=TRUE)
==227== Conditional jump or move depends on uninitialised value(s)
==227==    at 0x404DE1F: ???
==227==    by 0xB8A773F: ???
==227==    by 0xB8A773F: ???
==227==    by 0xB8A77C0: ???
==227==    by 0x1FFEFFCDEF: ???
==227==  Uninitialised value was created by a client request
==227==    at 0x4FCDAD9: Rf_allocVector3 (memory.c:2792)
==227==    by 0x4FB2D9C: Rf_allocVector (Rinlinedfuns.h:577)
==227==    by 0x4FCDDF2: Rf_allocCharsxp (memory.c:2814)
==227==    by 0x4F6779C: Rf_mkCharLenCE (envir.c:3913)
==227==    by 0x149A51D7: FANSI_strip (strip.c:170)
==227==    by 0x4F2A0F2: R_doDotCall (dotcode.c:576)
==227==    by 0x4F91F3B: bcEval (eval.c:7279)
==227==    by 0x4F70257: Rf_eval (eval.c:624)
==227==    by 0x4F72A7C: R_execClosure (eval.c:1764)
==227==    by 0x4F72770: Rf_applyClosure (eval.c:1692)
==227==    by 0x4F70A7F: Rf_eval (eval.c:747)
==227==    by 0x4F6FF36: forcePromise (eval.c:520)

But we can also recreate this with a simple C function that allocates a 129 character string:

# function that allocates a 129 byte string of 'a'

f <- inline::cfunction(body='
char * chr = R_alloc(130, sizeof(char));
for(int i = 0; i < 129; ++i) {*(chr + i) = \'a\';}
*(chr + 129) = 0;
return mkString(chr);'
)
# Create the 129 byte string and split

x <- f()
strsplit(x, ' ', perl=TRUE)
==227== Conditional jump or move depends on uninitialised value(s)
==227==    at 0x403E0AF: ???
==227==    by 0xA4A654F: ???
==227==    by 0xA4A654F: ???
==227==    by 0xA4A65D0: ???
==227==    by 0x1FFEFFCDEF: ???
==227==  Uninitialised value was created by a client request
==227==    at 0x4FCDAD9: Rf_allocVector3 (memory.c:2792)
==227==    by 0x4FB2D9C: Rf_allocVector (Rinlinedfuns.h:577)
==227==    by 0x4FCDDF2: Rf_allocCharsxp (memory.c:2814)
==227==    by 0x4F6779C: Rf_mkCharLenCE (envir.c:3913)
==227==    by 0x4F67255: Rf_mkChar (envir.c:3754)
==227==    by 0x4FB3EE2: Rf_mkString (Rinlinedfuns.h:1079)
==227==    by 0x1079B673: filee3689e28c4 (filee3689e28c4.cpp:15)
==227==    by 0x4F2A074: R_doDotCall (dotcode.c:567)
==227==    by 0x4F35210: do_dotcall (dotcode.c:1252)
==227==    by 0x4F7093C: Rf_eval (eval.c:727)
==227==    by 0x4F72A7C: R_execClosure (eval.c:1764)
==227==    by 0x4F72770: Rf_applyClosure (eval.c:1692)
==227== 

Similarly, we can just do it call completely in R:

> strsplit(paste0(rep("A", 129), collapse=""), " ", perl=TRUE)
==227== Conditional jump or move depends on uninitialised value(s)
==227==    at 0x404DE1F: ???
==227==    by 0xADF690F: ???
==227==    by 0xADF690F: ???
==227==    by 0xADF6990: ???
==227==    by 0x1FFEFFCDEF: ???
==227==  Uninitialised value was created by a client request
==227==    at 0x4FCDAD9: Rf_allocVector3 (memory.c:2792)
==227==    by 0x4FB2D9C: Rf_allocVector (Rinlinedfuns.h:577)
==227==    by 0x4FCDDF2: Rf_allocCharsxp (memory.c:2814)
==227==    by 0x4F6779C: Rf_mkCharLenCE (envir.c:3913)
==227==    by 0x4F671DC: Rf_mkCharCE (envir.c:3740)
==227==    by 0x4FDC361: do_paste (paste.c:273)
==227==    by 0x4F83444: bcEval (eval.c:6771)
==227==    by 0x4F70257: Rf_eval (eval.c:624)
==227==    by 0x4F72A7C: R_execClosure (eval.c:1764)
==227==    by 0x4F72770: Rf_applyClosure (eval.c:1692)
==227==    by 0x4F70A7F: Rf_eval (eval.c:747)
==227==    by 0x4F6FF36: forcePromise (eval.c:520)
==227== 

The error only happens if the string is longer than 128 bytes and if we are in perl=TRUE split mode. Also, the error only happens the first time a particular string is fed to strsplit in a given session.

The 128 byte limit is almost certainly because that's the point at which vectors get malloc'ed individually. For whatever reason, strsplit, or more likely the pcre code triggers the valgrind warnings.

And of course, who wrote substr?

 36990     ripley SEXP attribute_hidden do_strsplit(SEXP call, SEXP op, SEXP args, SEXP env)
     2          r {
 49658     ripley     SEXP args0 = args, ans, tok, x;
 59173     ripley     R_xlen_t i, itok, len, tlen;
 59173     ripley     size_t j, ntok;

So all I have to do is write back to BDR and tell him it's all his fault, right, right ;)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment