Skip to content

Instantly share code, notes, and snippets.

@KaiAragaki
Last active June 8, 2023 13:05
Show Gist options
  • Save KaiAragaki/0f9c122db6f0416dfb3f5431bdd3b50e to your computer and use it in GitHub Desktop.
Save KaiAragaki/0f9c122db6f0416dfb3f5431bdd3b50e to your computer and use it in GitHub Desktop.
My workflow
Almost all - if not all - of my new R projects are targets projects.
I'm usually bouncing between computers and occasionally have a collaborator, so targets makes it super super simple to spin up
my analyses at another computer.
Nowadays, I don't just use a targets project, but I like having multiple targets projects within one targets project. This
keeps the pipelines from getting to unweildly, at which point I usually get pretty overwhelmed.
(From here on, assume that when I say 'project' I mean a targets project, not an RStudio project - all of this happens in a
single RStudio project)
The first project I create is called 'common' and contains anything that could need to be accessed by other projects.
This keeps my hierarchies flat. In my _targets.yaml, name all my targets stores as 'store_proj-name'. So for common,
the store is in "store_common". I put the targets pipeline in ./R/targets/common.R and the functions it uses in
./R/functions/common.R. In this 'common' pipeline, I usually set up folder structures - I usually have one called 01_data,
where data gets downloaded to, and one called 02_figs, where figures get saved. I also create individual subdirectories
for each project's figures (eg 02_figs/pcr - more on that below)
The next projects I create are usually siloed by the kind of information they contain. I work in a wet lab,
so my information kind of naturally separates itself by experiment type. I might call my next project "pcr".
This again gets a "store_pcr" targets entry line in the yaml, as well as ".R/targets/pcr.R" line to point to its pipeline
location. Finally, any functions it uses is stored in "./R/functions/pcr.R".
For me, each target usually means a single figure. I used to break up my targets quite a bit more, but I didn't really use
the upstream targets so much and wasn't saving a whole lot of time caching my results from the upstream targets. The reslts
of these get saved to 02_figs, to their own subdirectory (eg 02_figs/01_pcr/my-fig.png).
These 'figure targets' have a general skeleton like so:
tar_file(
my_figure,
make_my_figure("fig-name.png", pcr_plot_dir)
)
(pcr_plot_dir usually comes from something at the top of the script like:
pcr_plot_dir <- tar_read("pcr_plot_dir", store = "store_common"))
And the 'make_my_figure' function looks like this:
make_my_figure <- function(filename, pcr) {
# Oftentimes I download the data inside this function and use it immediately
# I know that this is an affront to functional programming and targets in general
# I don't care!
# But sometimes if I'm using data that will be used across multiple targets, I'll
# download it to, say, 01_data/01_pcr/my-data.csv and use it as an input for this target
# Oftentimes, though, individual data files don't really make 'sense' as targets. I'll
# usually forgo functional purity for semantic clarity.
my_data <- downloading_stuff("path/to/cloud/storage")
plot <- my_data |>
making() |>
my() |>
ggplot()
out <- fs::path(out_dir, paste0(filename, ".png"))
ggsave(out, plot, units = "in", width = 3.5, height = 3.5, dpi = 500)
}
Note how the target name is "my_figure" and the function name is "make_my_figure". This is a common pattern I follow.
Additionally, I like to have a kind of 'staging area' file at top level called scratch.R.
This isn't part of the targets pipeline - this is a testbed for making figures and stuff before
I put them in the pipeline, which allows for rapid and piecemeal development.
Other things:
I like to use the `conflicted` package and include a (RStudio) project level .Rprofile that determines 'winners':
conflicted::conflict_prefer("select", "dplyr")
conflicted::conflict_prefer("filter", "dplyr")
conflicted::conflict_prefer("rename", "dplyr")
conflicted::conflict_prefer("path", "fs")
I .gitignore anything that is created by the targets pipelines - like 01_data/, 02_figs/.
Example _targets.yaml:
human_seq:
script: R/targets/human_seq.R
store: store_human_seq
common:
script: R/targets/common.R
store: store_common
wb:
script: R/targets/wb.R
store: store_wb
pcr:
script: R/targets/pcr.R
store: store_pcr
tw:
script: R/targets/tw.R
store: store_tw
cell_seq:
script: R/targets/cell_seq.R
store: store_cell_seq
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment