Skip to content

Instantly share code, notes, and snippets.

@jiajic
Last active May 22, 2024 14:07
Show Gist options
  • Save jiajic/bc8abf18fd362ca36790b8c074847bb9 to your computer and use it in GitHub Desktop.
Save jiajic/bc8abf18fd362ca36790b8c074847bb9 to your computer and use it in GitHub Desktop.
Experimental modular CosMx importer utility

CosMx Importer (dev branch)

Giotto and GiottoClass provide several basic functions for reading in a variety of spatial-omic information. However, given how many different types of platforms and formats there are, these functions come with many settings that need to be coordinated to put together the subobjects and ultimately a giotto analysis object.

Giotto is implementing modular readers with baked-in params to load in individual pieces of information. These params are just defaults, so the modular readers can also be easily modified to do things like load from a non-standard directory or change how the data is parsed for a specific piece of information. This functionality can also be wrapped to produce a generalized convenience function.

This example script introduces the CosmxReader object and requires the @modular_readers branch from Giotto to function.

# install dev branch items
if (!requireNamespace("pak", quietly = TRUE)) install.packages("pak")
pak::pak("drieslab/Giotto@modular_readers")

Load Pancreas WTX dataset

?importCosMx  

# create CosmxReader
cosmx <- importCosMx()
force(cosmx)
## Giotto <CosmxReader>
## dir     : 
## slide   : 1 
## fovs    : all 
## micron  : FALSE 
## offsets :

Update CosmxReader with params

cosmx$cosmx_dir <- "[PATH TO OUTPUT DIRECTORY]"
cosmx$fovs <- c(51:52) # subset of FOVs to load if desired
force(cosmx)
## Giotto <CosmxReader>
## dir     : /projectnb/rd-s[...]Multiplexing_RNA/CosMx/Pancreas_WTX/ 
## slide   : 1 
## fovs    : 51, 52 
## micron  : FALSE 
## offsets : found 
## funs    : load_transcripts()
##           load_polys()
##           load_expression()
##           load_images()
##           load_cellmeta()
##           create_gobject()

Set cosmx$micron <- TRUE if desired to load data scaled to microns.
For this dataset, the scalefactor is 0.12028 px -> micron (provided by Nanostring in the ReadMe.html).
You can set an alternative specific scalefactor by using cosmx$px2mm <- ??

FOV shifts

The FOV shifts/positions info can be found in $offsets. They are auto-detected from either the metadata or transcripts information when the cosmx_dir is provided.

force(cosmx$offsets)
## Key: <fov>
##       fov        x         y
##     <int>    <num>     <num>
##  1:    51 52439.04 41925.408
##  2:    52 52439.04 37669.372
##  3:    53 52439.04 33413.336
##  4:    54 56695.08 33413.336
##  5:    55 56695.08 37669.372
##  6:    56 56695.08 41925.408
##  7:    57 60951.12 41925.408
##  8:    58 60951.12 37669.372
##  9:    59 60951.12 33413.336
## 10:    60 51470.54  8512.071
## 11:    61 51470.54  4256.036
## 12:    62 51470.54     0.000
## 13:    63 55726.59     0.000
## 14:    64 55726.59  4256.036
## 15:    65 55726.59  8512.071
## 16:    66 59982.63  8512.071
## 17:    67 59982.63  4256.036
## 18:    68 59982.63     0.000

When offsets are present or calculated, they can also be plotted to get an idea of the available FOVs. The numbers are where the upper left corners of the FOV are positioned.

plot(cosmx)

image

If the offsets are innaccurate or cannot be auto detected (or you would like to manually spatially re-arrange FOV placements), they can also be set. Input must be a data.table with cols fov, x, y of types integer, numeric, numeric. You can set it like this:

# cosmx$offsets <- ??

Create gobject

It is possible to load in the expression and additional metadata from NanoString for a set of information that may not fully agree with how Giotto normally calculates counts/cell (but should be very similar).

With pre-made aggregate info

With Giotto’s aggregation workflow, you would usually run calculateOverlap() and overlapToMatrix() after creating the object in order to aggregate the transcript detections by polygon to create the count matrix, but if the NanoString pre-made expression and metadata were already loaded, you would skip this step. Running the aggregation workflow would either replace the NanoString values or make them invalid.

# LOADING PREMADE NANOSTRING EXPRESSION MATRICES AND METADATA
g <- cosmx$create_gobject(
  feat_type = c("rna", "negprobes", "syscontrolcodes"),
  split_keyword = list("Negative", "SystemControl"),
  load_expression = T,
  load_cellmeta = T)
  
# ! important !
# Skip `calculateOverlap()` and `overlapToMatrix()` if expression and metadata were loaded
g <- addSpatialCentroidLocations(g)
## Start centroid calculation for polygon information
##  layer: cell
force(g)
## An object of class giotto 
## >Active spat_unit:  cell 
## >Active feat_type:  rna 
## [SUBCELLULAR INFO]
## polygons      : cell 
## features      : rna negprobes syscontrolcodes 
## [AGGREGATE INFO]
## expression -----------------------
##   [cell][rna] raw
##   [cell][negprobes] raw
##   [cell][syscontrolcodes] raw
## spatial locations ----------------
##   [cell] raw
## attached images ------------------
## images      : 4 items...
## 
## 
## Use objHistory() to see steps and params used

OR Loading only raw data (default behavior)

calculateOverlap() and overlapToMatrix() should be run when only raw data is loaded to use Giotto's aggregation pipeline to generate the expression matrix.

# SKIPPING PREMADE NANOSTRING EXPRESSION MATRICES AND METADATA (default)
g <- cosmx$create_gobject()

# ! important !
# You should run `calculateOverlap()` and `overlapToMatrix()` with this.
g <- calculateOverlap(
  g,
  name_overlap = "rna",
  spatial_info = "cell",
  feat_info = "rna",
  return_gobject = TRUE,
  verbose = FALSE
)

g <- overlapToMatrix(
  g,
  name = "raw",
  poly_info = "cell",
  feat_info = "rna",
  type = "point",
  return_gobject = TRUE,
  verbose = FALSE
)

Some plots

spatInSituPlotPoints(
  g,
  feats = list("rna" = c(head(featIDs(g, "rna"), 4L), "VIM")),
  show_image = TRUE,
  image_name =  c("composite_fov051", "composite_fov052"),
  show_polygon = TRUE,
  polygon_color = "white",
  polygon_line_size = 0.1,
  use_overlap = FALSE, # since none exist (skipped)
)

image

spatInSituPlotPoints(
  g,
  show_image = TRUE,
  image_name =  c("composite_fov051", "composite_fov052"),
  show_polygon = TRUE,
  polygon_fill_as_factor = FALSE,
  polygon_fill = "Max.CD45",
  polygon_color = "white",
  polygon_line_size = 0.1
)

image

Load in piece-wise

# Load polygons, transcripts, and images
polys <- cosmx$load_polys()

plot(polys)

image

tx <- cosmx$load_transcripts(
  feat_type = c("rna", "negprobes", "syscontrolcodes"),
  split_keyword = list("Negative", "SystemControl")
)

plot(tx$rna, dens = TRUE)

image

# tx vs poly overlap
plot(tx$rna, raster = FALSE)
terra::lines(polys@spatVector, col = "red", alpha = 0.4)

image

imgs <- cosmx$load_images()
plot(imgs[[1]])

image

plot(imgs[[2]])

image

cx <- cosmx$load_cellmeta()
force(cx)
## An object of class cellMetaObj 
## spat_unit : "cell"
## feat_type : "rna"
## provenance: cell 
## 
##     cell_ID  Area AspectRatio Width Height Mean.PanCK Max.PanCK
##      <char> <int>       <num> <int>  <int>      <int>     <int>
## 1: c_1_51_1  5997        1.39   103     74       1565      5576
## 2: c_1_51_2  1976        2.47    89     36        594      2260
## 3: c_1_51_3  2664        1.34    71     53        989      2812
##    Mean.CD68_CK8_18 Max.CD68_CK8_18 Mean.CD298_B2M Max.CD298_B2M Mean.CD45
##               <int>           <int>          <int>         <int>     <int>
## 1:             1701            6376           2799          8980       221
## 2:              154             636            930          1864       108
## 3:              135             728           1134          2400        92
##    Max.CD45 Mean.DAPI Max.DAPI     X version Run_name Run_Tissue_name   tissue
##       <int>     <int>    <int> <int>  <char>   <char>          <char>   <char>
## 1:      996      1155     3476     1      v6 Slide_S1        Pancreas Pancreas
## 2:      360      1126     2652     1      v6 Slide_S1        Pancreas Pancreas
## 3:      320       794     1884     1      v6 Slide_S1        Pancreas Pancreas
##     Panel assay_type slide_ID unassignedTranscripts nCount_RNA nFeature_RNA
##    <char>     <char>    <int>                 <num>      <int>        <int>
## 1:    WTX        RNA        1             0.0781346        315          202
## 2:    WTX        RNA        1             0.0781346        346          143
## 3:    WTX        RNA        1             0.0781346        508          184
##    nCount_negprobes nFeature_negprobes nCount_falsecode nFeature_falsecode
##               <int>              <int>            <int>              <int>
## 1:                0                  0                4                  4
## 2:                0                  0                3                  2
## 3:                0                  0                5                  5
##    Area.um2
##       <num>
## 1: 86.76163
## 2: 28.58779
## 3: 38.54144
ex <- cosmx$load_expression(
  feat_type = c("rna", "negprobes", "syscontrolcodes"),
  split_keyword = list("Negative", "SystemControl")
)
force(ex)
## [[1]]
## An object of class exprObj : "raw"
## spat_unit : "cell"
## feat_type : "rna"
## provenance: cell 
## 
## contains:
## 18946 x 5534 sparse Matrix of class "dgCMatrix"
##                                    
## A1BG . . . 1 . . . . . . . . ......
## A1CF . 1 . 2 . . . . . 1 . . ......
## A2M  . . . . . . . . . . . . ......
## 
##  ........suppressing 5522 columns and 18940 rows in show(); maybe adjust options(max.print=, width=)
##                                     
## ZYX   . . . 1 . . . . . . . . ......
## ZZEF1 . . . . . . . . . . . . ......
## ZZZ3  . . . . . . . . . . . . ......
## 
##  First four colnames:
##  c_1_51_1 c_1_51_2 c_1_51_3 c_1_51_4 
## 
## [[2]]
## An object of class exprObj : "raw"
## spat_unit : "cell"
## feat_type : "negprobes"
## provenance: cell 
## 
## contains:
## 50 x 5534 sparse Matrix of class "dgCMatrix"
##                                            
## Negative1  . . . . . . . . . . . . . ......
## Negative10 . . . . . . . . . . . . . ......
## Negative11 . . . . . . . . . . . . . ......
## 
##  ........suppressing 5521 columns and 44 rows in show(); maybe adjust options(max.print=, width=)
##                                           
## Negative7 . . . . . . . . . . . . . ......
## Negative8 . . . . . . . . . . . . . ......
## Negative9 . . . . . . . . . . . . . ......
## 
##  First four colnames:
##  c_1_51_1 c_1_51_2 c_1_51_3 c_1_51_4 
## 
## [[3]]
## An object of class exprObj : "raw"
## spat_unit : "cell"
## feat_type : "syscontrolcodes"
## provenance: cell 
## 
## contains:
## 2735 x 5534 sparse Matrix of class "dgCMatrix"
##                                                
## SystemControl1   . . . . . . . . . . . . ......
## SystemControl10  . . . . . . . . . . . . ......
## SystemControl100 . . . . . . . . . . . . ......
## 
##  ........suppressing 5522 columns and 2729 rows in show(); maybe adjust options(max.print=, width=)
##                                                
## SystemControl997 . . . . . . . . . . . . ......
## SystemControl998 . . . . . . . . . . . . ......
## SystemControl999 . . . . . . . . . . . . ......
## 
##  First four colnames:
##  c_1_51_1 c_1_51_2 c_1_51_3 c_1_51_4

Note that these individual functions also have path params, allowing alternative paths to be provided if the default expected locations are incorrect. The provided create_gobject() function is just a combination of these more granular load functions along with detection of data from expected locations in the output directory.

The feat_type and split_keyword params used here for load_transcripts() and load_expression() are also different than those used for the original lung12/lung13 type datasets. This is because the diagnostic probes used in this dataset are different than those original datasets. The feat_type and split_keyword params you use should be tailored for whichever probe types you have in your data.

Giotto object from piece-wise

An empty giotto object can be created using

g2 <- giotto()

Data can then be added one by one.

g2 <- setGiotto(g2, polys)
g2 <- setGiotto(g2, tx)
g2 <- setGiotto(g2, imgs)
g2 <- setGiotto(g2, ex)
g2 <- setGiotto(g2, cx)
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: AlmaLinux 8.9 (Midnight Oncilla)
## 
## Matrix products: default
## BLAS:   /share/pkg.7/r/4.1.1_noblas/install/lib64/R/lib/libRblas.so
## LAPACK: /share/pkg.7/r/4.1.1_noblas/install/lib64/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Giotto_4.1.0      GiottoClass_0.3.1
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.7          sass_0.4.9          tidyr_1.3.1        
##  [4] jsonlite_1.8.8      viridisLite_0.4.2   gtools_3.9.5       
##  [7] bslib_0.7.0         sp_2.1-4            highr_0.10         
## [10] GiottoVisuals_0.2.2 yaml_2.3.8          job_0.3.1          
## [13] ggrepel_0.9.5       pillar_1.9.0        backports_1.4.1    
## [16] lattice_0.20-45     glue_1.7.0          reticulate_1.37.0  
## [19] digest_0.6.35       RColorBrewer_1.1-3  checkmate_2.3.1    
## [22] colorspace_2.1-0    cowplot_1.1.3       htmltools_0.5.8.1  
## [25] Matrix_1.6-2        plyr_1.8.9          pkgconfig_2.0.3    
## [28] magick_2.8.3        purrr_1.0.2         scales_1.3.0       
## [31] terra_1.7-71        GiottoUtils_0.1.8   tibble_3.2.1       
## [34] generics_0.1.3      dbscan_1.1-12       farver_2.1.2       
## [37] ggplot2_3.5.1       cachem_1.1.0        withr_3.0.0        
## [40] lazyeval_0.2.2      cli_3.6.2           crayon_1.4.2       
## [43] magrittr_2.0.3      deldir_2.0-4        evaluate_0.23      
## [46] fansi_1.0.6         progressr_0.14.0    tools_4.1.1        
## [49] data.table_1.15.4   lifecycle_1.0.4     matrixStats_1.3.0  
## [52] stringr_1.5.1       plotly_4.10.4       munsell_0.5.1      
## [55] compiler_4.1.1      jquerylib_0.1.4     rlang_1.1.3        
## [58] scattermore_1.2     grid_4.1.1          rstudioapi_0.16.0  
## [61] rappdirs_0.3.3      htmlwidgets_1.6.4   igraph_2.0.3       
## [64] labeling_0.4.3      rmarkdown_2.27      gtable_0.3.5       
## [67] codetools_0.2-18    reshape2_1.4.4      R6_2.5.1           
## [70] knitr_1.46          dplyr_1.1.4         fastmap_1.2.0      
## [73] utf8_1.2.4          colorRamp2_0.1.0    stringi_1.8.4      
## [76] parallel_4.1.1      Rcpp_1.0.12         vctrs_0.6.5        
## [79] png_0.1-8           tidyselect_1.2.1    xfun_0.44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment