Skip to content

Instantly share code, notes, and snippets.

@hyginn
Last active April 3, 2019 16:36
Show Gist options
  • Save hyginn/80bffa8ad1428e8b84591c760e96e875 to your computer and use it in GitHub Desktop.
Save hyginn/80bffa8ad1428e8b84591c760e96e875 to your computer and use it in GitHub Desktop.
Utility to find strings in (source) files
# Author: Boris Steipe (ORCID: 0000-0002-1134-6758)
# License: (c) Author (2019) + MIT
# Date: 2019-04-03

Grep in source files

In a current project, we recently merged in a number of pull requests and I thought they had passed all checks ... but after everything was done, Check produced a warning:

# checking R code for possible problems ... NOTE
  System_Expression: no visible binding for global variable ‘var’
  Undefined global functions or variables:
    var
  Consider adding
    importFrom("stats", "var")
  to your NAMESPACE file.

Well that's too bad. I don't like cluttering the namespace with imported things if there's no obvious reason, stats is a default package, and when things are conflicting it makes code so much easier to read if functions are prefixed with their package. That said - since stats is a default, it should not have been flagged in the first place ... but here we are.

The obvious next step is to figure out where the variable appears in the first place. But it could be alone, or adjoined to an operator, or parentheses, or a part of "variable", "variation", or "samovar" ... clearly, I want a function that will find strings in code. But not just any code, but files that have a certain extension. In directories that I want. Please.

Time to break out the coding gloves:

#' grepSrc
#'
#' \code{grepSrc} Utility to grep source files for the presence of a string.
#'
#' @section Details: The function checks all files in directories \code{path}
#'                   with extensions \code{ext} for the presence of matches
#'                   to the regular expression \code{patt}. For each match
#'                   the line and filename are printed.
#'
#' @param patt (char)  a regular expression that should match a string we are
#'                     looking for.
#' @param ext  (char)  a vector of regular expressions. Files with filenames
#'                     matching any of the expressions will be included.
#'                     Defaults to \code{c("\\.R$")}.
#' @param paths (char) a vector of paths. Defaults to
#'                     \code{c("./R", "./dev", "./inst/scripts", "./doc",
#'                            "./src", "./tests/testthat")}.
#' @return None. Invoked for the side-effect of printing a report to console.
#'
#' @author \href{https://orcid.org/0000-0002-1134-6758}{Boris Steipe} (aut)
#'
#' @examples
#' # Find all occurrences of a variable named "var" -  i.e. the string "var"
#' # between two word boundaries "\\b" - in ".R" files in source directories.
#' grepSrc("\\bvar\\b")
#'

grepSrc <- function(patt,
                    ext   = c("\\.R$"),
                    paths = c("./R", "./dev", "./inst/scripts",
                             "./doc", "./src", "./tests/testthat")) {

  N <- 80  # max numbers of characters to print from matching line

  # make vector of files to process
  myFiles <- character()
  for (myPath in paths) {
    for (myExt in ext) {
      myFiles <- c(myFiles, list.files(path = gsub("/$", "", myPath),
                                       pattern = myExt,
                                       full.names = TRUE))
    }
  }

  for (myFile in myFiles) {                   # For all requested files ...
    src <- readLines(myFile)                  #   read the sourcecode,
    indices <- grep(patt, src)                #   get indices for matches ...
    if (length(indices) > 0) {                #     If there are any ...
      cat(sprintf("File: \"%s\"\n", myFile))  #       print the filename ...
      for (idx in indices) {                  #       and N characters.
        cat(sprintf("# %d\t%s\n", idx, substr(src[idx], 1, N)))
      }
    }
  }

  return(invisible(NULL))
}

 

Trying this out:

> grepSrc("\\bvar\\b")
File: "./R/subunitcoExpressionAnalysis.R"
# 112	  myComplexQNXPdf <- tibble::rownames_to_column(myComplexQNXPdf, var = "genes")
File: "./R/System_Expression.R"
# 35	  varExpression <- apply(expressions, 1, var, na.rm = T)

Nice: changing var in line 35 of System_Expression.R to stats::var does solve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment