jennybc/2014-10-12_stop-working-directory-insanity.md

## 2014-10-12_stop-working-directory-insanity.md

      
    Raw
  

              2014-10-12_stop-working-directory-insanity.md
            
          
    There are packages for this now!

2017-08-03: Since I wrote this in 2014, the universe, specifically Kirill Müller (https://github.com/krlmlr), has provided better solutions to this problem. I now recommend that you use one of these two packages:

rprojroot: This is the main package with functions to help you express paths in a way that will "just work" when developing interactively in an RStudio Project and when you render your file.
here: A lightweight wrapper around rprojroot that anticipates the most likely scenario: you want to write paths relative to the top-level directory, defined as an RStudio project or Git repo. TRY THIS FIRST.

I love these packages so much I wrote an ode to here.
I use these packages now instead of what I describe below. I'll leave this gist up for historical interest. 😆
TL;DR

Include this in the .Rprofile in the top-level directory of an RStudio project:
RPROJ <- list(PROJHOME = normalizePath(getwd()))
attach(RPROJ)
rm(RPROJ)
Then build paths like so:
file.path(PROJHOME, <the_sub_dir>, <the_file_name>)
and never worry about working directory again(?). Read on for the problem I am trying to solve.
Problem statement

My near-daily dilemma

An R project -- and RStudio Project -- that is big enough to require sub-directory structure
R scripts and R Markdown files in more than one sub-directory that I want to render() and source()

During development and informal testing, I want to iterate fast and enjoy RStudio's facilities for running bits of code or sourcing/compiling entire files. The "Compile Notebook" and "Knit HTML" buttons (and knitr and rmarkdown packages in general) assume that working directory = directory where source file lives. In some rather theoretical sense, this is not strictly true, but life is much easier if you resign yourself to this.
In "production," I want to use a Makefile or similar to run scripts and compile R Markdown; this file can obviously live in only one place, with the most obvious and canonical choice being the top-level Project directory, where no R scripts or R Markdown files are to be found.


I'm against setwd() for all the usual reasons, e.g. portability.

So, what's working directory going to be folks? The above rules and needs admit no obvious solution. Is every one else faffing around with working directory as much as I am?
Define the home directory for a Project

In the pre-RStudio era, I used to define a path object at the top of every file, whereAmI, and I constructed absolute paths based on that. I'm returning to this idea but want to upgrade the smarts, so the solution is more general. I think I've answered my own question.
I define the Project home directory to be the directory where the <project_name>.Rproj file sits.
Store Project home directory in an environment

I cannot believe I am using attach() but here goes.
Create a .Rprofile file in the Project home directory that includes these lines:
RPROJ <- list(PROJHOME = normalizePath(getwd()))
attach(RPROJ)

cat("sourcing Project-specific .Rprofile\n")
cat('retrieve the top-level Project directory at any time with PROJHOME or via get("PROJHOME", "RPROJ"):\n',
    get("PROJHOME", "RPROJ"), "\n")

rm(RPROJ)
This creates a new environment on the search path, named RPROJ, containing an object PROJHOME giving the normalized absolute path to Project home. Since the value is determined at the time of R session start, this should work for different collaborators/machine/OSes. In theory.
Always specify paths relative to Project home

The easiest way to retrieve the Project home is simply via PROJHOME, though in theory that could be masked by objects with the same name earlier in the search path. The most proper way to access is via get("PROJHOME", "RPROJ").
Now I can build absolute-but-portable paths like so:
file.path(PROJHOME, <the_sub_dir>, <the_file_name>)
Here's what I see at R session start:
sourcing Project-specific .Rprofile
retrieve the top-level Project directory at any time with PROJHOME or via get("PROJHOME", "RPROJ"):
 /Users/jenny/path/to/my-project 
The interactive workspace is how I left it the last time I worked on this Project; in particular, it's not cluttered up with PROJHOME. The working directory of R Console is also how I left it, though this suddenly becomes much less important and I think this work style should eliminate fussing around with working directory. I can clean out the workspace with rm(list = ls()) or RStudio's broom button without harming my ability to build robust paths.
R processes launched from outside RStudio

Added 2014-12-12, after using above approach for a couple of months. Since people from #rrhack showed a glimmer of interest, want to add this missing piece.
Above approach will work if and only if path/to/my-project/.Rprofile is processed upon R start up. When does that happen?

Use of R through your RStudio Project. Behind the scenes RStudio launches R with working directory set to Project's home directory, before it restores working directory to its last known state.
Any R process with working directory of path/to/my-project/.

What other situations are likely to arise in practice? When will path/to/my-project/.Rprofile not get processed and PROJHOME will be undefined?


You have R scripts or RMarkdown files in a subdirectory, e.g., path/to/my-project/code/my_script.R.


You execute or render those files outside of RStudio from a working directory other than path/to/my-project/, e.g., via Make or from the shell:
~/path/to/my-project/code$ Rscript my_script.R  


My current solution: create an additional .Rprofile in any subdirectory that holds R scripts or RMarkdown files. Continuing the above example, we create path/to/my-project/code/.Rprofile. The only difference is the specification of PROJHOME as the parent of working directory:
    ```R
    RPROJ <- list(PROJHOME = normalizePath("..")))
    attach(RPROJ)
    rm(RPROJ)
    ```

Similarity to path handling in Jekyll

Jekyll is a static website generator. It supports the construction of relative paths through the notion that a website has a root directory. Within files for individual webpages, the path to root can be specified via YAML frontmatter. This, in turn, allows the construction of paths relative to root. The rationale is to encourage use of relative paths over absolute and to make it easy to develop content before the entire directory structure of a site is fixed.
Example of YAML frontmatter specifying relative path to website root:
    ---
    title: My Page title
    root: "../"
    ---

and here's how links would be built within a page:
    <img src="{{ post.root }}images/happy.png" />
    <a href="{{ post.root }}2010/01/01/another_post>Relative link to another post</a>

The Project home directory PROJHOME is equivalent to Jekyll's website root directory post.root. The use of .Rprofile to define PROJHOME is equivalent to Jekyll's use of root: "../" in YAML frontmatter.
Example from this stackoverflow thread.
Links and other thoughts

This stackoverflow thread is kind of relevant.
It was helpful to re-read the Environments chapter of Hadley's Advanced R book.
Should I just go ahead and set an environment variable, i.e. via Sys.setenv()?
Should I worry about where the RPROJ environment ends up in the search path?
This seems tied up with other issues, like building whole websites with rmarkdown, which currently also has a very "one directory to rule them all" approach (I'm looking at you _output.yaml, libs, include). It needs to be easier to designate home directory for a project or website and then write paths relative to that. The way jekyll works seems worth considering.