This is a checklist I use to harmonize scRNA-seq datasets before saving them as AnnData objects, loosely following cellxgene schema
- Check for NAs or cells labelled as None - rename all cells with such label as
'unknown'
- Store main cell type annotation to use as
adata.obs["cell_type"]
- Remove additional
adata.obs
columns containing alternative cell type annotation labels. If one or more alternative annotation labels need to be kept, rename tocell_type_*
where*
describes the difference with the main cell type annotation. - Store disease annotation to
adata.obs['disease']
, labelling healthy cells as'control'
(notControl
,healthy
,normal
)