Skip to content

Instantly share code, notes, and snippets.

@effigies
Last active July 22, 2022 15:51
Show Gist options
  • Save effigies/312d615f78bc2ce1543be7b514a924af to your computer and use it in GitHub Desktop.
Save effigies/312d615f78bc2ce1543be7b514a924af to your computer and use it in GitHub Desktop.
How to view changes to an OpenNeuro dataset draft

How to view changes to an OpenNeuro dataset draft

OpenNeuro has implemented a data retention policy, stating that datasets that have been in draft state for greater than 28 days may be reverted to the latest snapshot. Unfortunately, we don't currently have an interface for viewing what changes have been made since the last snapshot, so users may not know whether they want to create a new snapshot or not.

This gist shows two ways to view the changes using the OpenNeuro CLI. We will use ds000001 as an example.

Download and diff

The easy but slow approach would be to use the CLI to download two copies of your dataset, the most recent tag and the draft, and run diff -r on the pair:

$ openneuro download --snapshot 1.0.0 ds000001 ds000001-v1.0.0/
$ openneuro download --draft ds000001 ds000001-draft/
$ diff -r ds000001-v1.0.0 ds000001-draft

This will show changes in text files and binary files like so:

diff --color -r ds000001-v1.0.0/participants.tsv ds000001-draft/participants.tsv
2c2
< sub-01	F	26
---
> sub-01	F	27
Binary files ds000001-v1.0.0/sub-01/anat/sub-01_T1w.nii.gz and ds000001-draft/sub-01/anat/sub-01_T1w.nii.gz differ

Using DataLad and git diff

By using DataLad, changes can be seen without downloading the full content. This will require installing DataLad and the OpenNeuro CLI, and setting up the credential helper.

Once you have done these things, on your dataset, you can click the "Clone" button and copy the OpenNeuro URL. If my dataset were ds000001, I would get https://openneuro.org/git/0/ds000001. From here I could clone the dataset and compare the latest version:

$ datalad clone https://openneuro.org/git/0/ds000001
[INFO   ] Remote origin not usable by git-annex; setting annex-ignore                               
[INFO   ] https://openneuro.org/git/0/ds000001/config download failed: Not Found                    
[INFO   ] access to 1 dataset sibling s3-PRIVATE not auto-enabled, enable with:
|         datalad siblings -d "/data/bids/ds000001" enable -s s3-PRIVATE 
install(ok): /data/bids/ds000001 (dataset)
$ cd ds000001

Supposing I have one change saved in the draft, I could see that and find out the most recent version:

$ git describe --tags
1.0.0-1-ga5184e8

I would then compare with the latest version with git diff:

$ git diff 1.0.0

This has the output:

diff --git a/participants.tsv b/participants.tsv
index 4367938..6ca1efd 100644
--- a/participants.tsv
+++ b/participants.tsv
@@ -1,5 +1,5 @@
 participant_id sex     age
-sub-01 F       26
+sub-01 F       27
 sub-02 M       24
 sub-03 F       27
 sub-04 F       20
diff --git a/sub-01/anat/sub-01_T1w.nii.gz b/sub-01/anat/sub-01_T1w.nii.gz
index 25cb343..7b5fdfe 120000
--- a/sub-01/anat/sub-01_T1w.nii.gz
+++ b/sub-01/anat/sub-01_T1w.nii.gz
@@ -1 +1 @@
-../../.git/annex/objects/V7/Pj/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz/MD5E-s5663237--4608ffbd6b78ce3a325eb338fa556589.nii.gz
\ No newline at end of file
+../../.git/annex/objects/K0/1M/MD5E-s5736750--4ba3ad9eaa54aab87d97fa0d60b576ad.nii.gz/MD5E-s5736750--4ba3ad9eaa54aab87d97fa0d60b576ad.nii.gz
\ No newline at end of file

To highlight specific differences in TSV files with many columns, consider using the --word-diff option, e.g.,

$ git diff --word-diff=color 1.0.0 participants.tsv
diff --git a/participants.tsv b/participants.tsv
index 4367938..6ca1efd 100644
--- a/participants.tsv
+++ b/participants.tsv
@@ -1,5 +1,5 @@
participant_id  sex     age
sub-01  F       [-26-]{+27+}
sub-02  M       24
sub-03  F       27
sub-04  F       20

This does not render well in Markdown, but thanks to @bpoldrack for the tip!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment