KaiAragaki/my-workflow.txt

## my-workflow.txt
Almost all - if not all - of my new R projects are targets projects.

I'm usually bouncing between computers and occasionally have a collaborator, so targets makes it super super simple to spin up
my analyses at another computer.

Nowadays, I don't just use a targets project, but I like having multiple targets projects within one targets project. This
keeps the pipelines from getting to unweildly, at which point I usually get pretty overwhelmed.
(From here on, assume that when I say 'project' I mean a targets project, not an RStudio project - all of this happens in a
single RStudio project)

The first project I create is called 'common' and contains anything that could need to be accessed by other projects.
This keeps my hierarchies flat. In my _targets.yaml, name all my targets stores as 'store_proj-name'. So for common,
the store is in "store_common". I put the targets pipeline in ./R/targets/common.R and the functions it uses in
./R/functions/common.R. In this 'common' pipeline, I usually set up folder structures - I usually have one called 01_data,
where data gets downloaded to, and one called 02_figs, where figures get saved. I also create individual subdirectories
for each project's figures (eg 02_figs/pcr - more on that below)

The next projects I create are usually siloed by the kind of information they contain. I work in a wet lab,
so my information kind of naturally separates itself by experiment type. I might call my next project "pcr".
This again gets a "store_pcr" targets entry line in the yaml, as well as ".R/targets/pcr.R" line to point to its pipeline
location. Finally, any functions it uses is stored in "./R/functions/pcr.R".

For me, each target usually means a single figure. I used to break up my targets quite a bit more, but I didn't really use
the upstream targets so much and wasn't saving a whole lot of time caching my results from the upstream targets. The reslts
of these get saved to 02_figs, to their own subdirectory (eg 02_figs/01_pcr/my-fig.png).

These 'figure targets' have a general skeleton like so:

tar_file(
  my_figure,
  make_my_figure("fig-name.png", pcr_plot_dir)
 )

(pcr_plot_dir usually comes from something at the top of the script like:
pcr_plot_dir <- tar_read("pcr_plot_dir", store = "store_common"))

And the 'make_my_figure' function looks like this:

make_my_figure <- function(filename, pcr) {

  # Oftentimes I download the data inside this function and use it immediately
  # I know that this is an affront to functional programming and targets in general
  # I don't care!

  # But sometimes if I'm using data that will be used across multiple targets, I'll
  # download it to, say, 01_data/01_pcr/my-data.csv and use it as an input for this target
  # Oftentimes, though, individual data files don't really make 'sense' as targets. I'll
  # usually forgo functional purity for semantic clarity.

  my_data <- downloading_stuff("path/to/cloud/storage")

  plot <- my_data |>
    making() |>
    my() |>
    ggplot()

  out <- fs::path(out_dir, paste0(filename, ".png"))
  ggsave(out, plot, units = "in", width = 3.5, height = 3.5, dpi = 500)
}

Note how the target name is "my_figure" and the function name is "make_my_figure". This is a common pattern I follow.

Additionally, I like to have a kind of 'staging area' file at top level called scratch.R.
This isn't part of the targets pipeline - this is a testbed for making figures and stuff before
I put them in the pipeline, which allows for rapid and piecemeal development.

Other things:

I like to use the `conflicted` package and include a (RStudio) project level .Rprofile that determines 'winners':

conflicted::conflict_prefer("select", "dplyr")
conflicted::conflict_prefer("filter", "dplyr")
conflicted::conflict_prefer("rename", "dplyr")
conflicted::conflict_prefer("path", "fs")

I .gitignore anything that is created by the targets pipelines - like 01_data/, 02_figs/.

Example _targets.yaml:

human_seq:
  script: R/targets/human_seq.R
  store: store_human_seq
common:
  script: R/targets/common.R
  store: store_common
wb:
  script: R/targets/wb.R
  store: store_wb
pcr:
  script: R/targets/pcr.R
  store: store_pcr
tw:
  script: R/targets/tw.R
  store: store_tw
cell_seq:
  script: R/targets/cell_seq.R
  store: store_cell_seq
	Almost all - if not all - of my new R projects are targets projects.

	I'm usually bouncing between computers and occasionally have a collaborator, so targets makes it super super simple to spin up
	my analyses at another computer.

	Nowadays, I don't just use a targets project, but I like having multiple targets projects within one targets project. This
	keeps the pipelines from getting to unweildly, at which point I usually get pretty overwhelmed.
	(From here on, assume that when I say 'project' I mean a targets project, not an RStudio project - all of this happens in a
	single RStudio project)

	The first project I create is called 'common' and contains anything that could need to be accessed by other projects.
	This keeps my hierarchies flat. In my _targets.yaml, name all my targets stores as 'store_proj-name'. So for common,
	the store is in "store_common". I put the targets pipeline in ./R/targets/common.R and the functions it uses in
	./R/functions/common.R. In this 'common' pipeline, I usually set up folder structures - I usually have one called 01_data,
	where data gets downloaded to, and one called 02_figs, where figures get saved. I also create individual subdirectories
	for each project's figures (eg 02_figs/pcr - more on that below)

	The next projects I create are usually siloed by the kind of information they contain. I work in a wet lab,
	so my information kind of naturally separates itself by experiment type. I might call my next project "pcr".
	This again gets a "store_pcr" targets entry line in the yaml, as well as ".R/targets/pcr.R" line to point to its pipeline
	location. Finally, any functions it uses is stored in "./R/functions/pcr.R".

	For me, each target usually means a single figure. I used to break up my targets quite a bit more, but I didn't really use
	the upstream targets so much and wasn't saving a whole lot of time caching my results from the upstream targets. The reslts
	of these get saved to 02_figs, to their own subdirectory (eg 02_figs/01_pcr/my-fig.png).

	These 'figure targets' have a general skeleton like so:

	tar_file(
	my_figure,
	make_my_figure("fig-name.png", pcr_plot_dir)
	)

	(pcr_plot_dir usually comes from something at the top of the script like:
	pcr_plot_dir <- tar_read("pcr_plot_dir", store = "store_common"))

	And the 'make_my_figure' function looks like this:

	make_my_figure <- function(filename, pcr) {

	# Oftentimes I download the data inside this function and use it immediately
	# I know that this is an affront to functional programming and targets in general
	# I don't care!

	# But sometimes if I'm using data that will be used across multiple targets, I'll
	# download it to, say, 01_data/01_pcr/my-data.csv and use it as an input for this target
	# Oftentimes, though, individual data files don't really make 'sense' as targets. I'll
	# usually forgo functional purity for semantic clarity.

	my_data <- downloading_stuff("path/to/cloud/storage")

	plot <- my_data \|>
	making() \|>
	my() \|>
	ggplot()

	out <- fs::path(out_dir, paste0(filename, ".png"))
	ggsave(out, plot, units = "in", width = 3.5, height = 3.5, dpi = 500)
	}

	Note how the target name is "my_figure" and the function name is "make_my_figure". This is a common pattern I follow.

	Additionally, I like to have a kind of 'staging area' file at top level called scratch.R.
	This isn't part of the targets pipeline - this is a testbed for making figures and stuff before
	I put them in the pipeline, which allows for rapid and piecemeal development.

	Other things:

	I like to use the `conflicted` package and include a (RStudio) project level .Rprofile that determines 'winners':

	conflicted::conflict_prefer("select", "dplyr")
	conflicted::conflict_prefer("filter", "dplyr")
	conflicted::conflict_prefer("rename", "dplyr")
	conflicted::conflict_prefer("path", "fs")

	I .gitignore anything that is created by the targets pipelines - like 01_data/, 02_figs/.

	Example _targets.yaml:

	human_seq:
	script: R/targets/human_seq.R
	store: store_human_seq
	common:
	script: R/targets/common.R
	store: store_common
	wb:
	script: R/targets/wb.R
	store: store_wb
	pcr:
	script: R/targets/pcr.R
	store: store_pcr
	tw:
	script: R/targets/tw.R
	store: store_tw
	cell_seq:
	script: R/targets/cell_seq.R
	store: store_cell_seq