Created
June 13, 2023 17:55
-
-
Save grst/652f0881d8c2db71c42260d892b73cb7 to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is a prototype of a streaming parser of the AIRR JSON format leveraging the ijson library.
It takes ~35 seconds on my laptop to create a mudata object including gene expression and
AIRR data on my laptop.
What is currently not handled properly is the sample-level metadata. It is just dumped as a json string
in mudata.uns["metadata"]. Ann/MuData doesn't have the concept of having sample rather than cell metadata.
I think for now the best solution would be to transfer the metadata of interest (say, the patient's sex and age) to
adata.obs as cell-level metadata. Since this will be stored as categoricals, it will be quite space efficent, also for larger datasets.