Skip to content

Instantly share code, notes, and snippets.

@grst
Created June 13, 2023 17:55
Show Gist options
  • Save grst/652f0881d8c2db71c42260d892b73cb7 to your computer and use it in GitHub Desktop.
Save grst/652f0881d8c2db71c42260d892b73cb7 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@grst
Copy link
Author

grst commented Jul 23, 2023

This is a prototype of a streaming parser of the AIRR JSON format leveraging the ijson library.

It takes ~35 seconds on my laptop to create a mudata object including gene expression and
AIRR data on my laptop.

What is currently not handled properly is the sample-level metadata. It is just dumped as a json string
in mudata.uns["metadata"]​. Ann/MuData doesn't have the concept of having sample rather than cell metadata.
I think for now the best solution would be to transfer the metadata of interest (say, the patient's sex and age) to
adata.obs​ as cell-level metadata. Since this will be stored as categoricals, it will be quite space efficent, also for larger datasets.

@grst
Copy link
Author

grst commented Jul 23, 2023

Tagging @bcorrie here, since I'm still unsure if you received my email.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment