Skip to content

Instantly share code, notes, and snippets.

@hadley
Created August 15, 2012 19:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hadley/3362659 to your computer and use it in GitHub Desktop.
Save hadley/3362659 to your computer and use it in GitHub Desktop.

Motivation

The base R API for file and directory manipulation is idiosyncratic, and scattered over multiple packages. This package provide an API with:

  • consistent input arguments and return values
  • wrappers for common tasks that are currently painful

Common API

Functions are named according to their unix equivalents (except for ls, which has a different meaning in R). Generally favour a smaller set of function with not too many number of options. But there's no need to capture all options from equivalent command line tools, and generally long argument names are preferred to cryptic abbreviations.

Return value

Functions return affected paths and whether or not each operation was successful:

  • rm: files that were removed
  • mv/cp: newly created files
  • ln: new symbolic link
  • mkdir: newly created directories

This will be a simple S3 class built on top of either a named logical vector or a data frame. An additional attribute (or column) stores the function name to make error messages nicer.

stop_on_failure makes it easy to turn the result object into a nicely formatted error message. (And automatically suppresses default warnings?)

Output during operation

All functions have a way to choose between:

  • no output (just return value)
  • warning on failures
  • warnings on failures and messages on successes

File matching

Instead of specifying a list of file/directory names, you can also use globs or regular expressions (to be matched to files in the current directory):

  • glob("*.r") (see Sys.glob or maybe glob2rx)
  • rx("$\\.r")

The results of these operations are always sorted in the C locale.

Filters

Filters make it easy to:

  • remove non-existing or non-writeable files
  • operate on successes
  • operate on failures

This leads to code like:

# Show all failed deletions
y <- rm(x); failed(y)

# Remove files from a partial copy
rm(succeeded(cp(a, b)) 

# Try deleting files 3 times
# (is this ambiguous - does it default to success or failure?)
rm(rm(rm(x))) 

Existing API

file_test -> stat
file.access -> stat
file.info -> stat, is_file, is_dir, is_link, is_writeable, is_readable
file.exists -> file_exists
-> dir_exists
Sys.readlink -> ?
Sys.umask -> 

-> relativepath
-> walk (python os.path.walk)

Sys.chmod -> chmod
? -> chown 
? -> chgrp
Sys.umask -> 

file.append -> 
copy.dir -> cp
file.copy -> cp
dir.create -> mkdir 
file.remove -> rm
file.rename -> mv
file.symlink -> ln
file.link -> ln
file.create -> touch
setwd -> cd (if second argument is non-NULL run that in context)
getwd -> pwd

list.dirs
list.files
dir

system.file
tempdir
tempfile

file.path -> path
normalizePath -> path
path.expand -> path
dirname -> path
basename -> path
file_ext -> path
file_path_sans_ext -> path
-> commonprefix

Inspirations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment