Skip to content

Instantly share code, notes, and snippets.

@ShaiberAlon
Created June 25, 2021 20:25
Show Gist options
  • Save ShaiberAlon/e3ea9c51cbeddd9a9222f932fb6f1bde to your computer and use it in GitHub Desktop.
Save ShaiberAlon/e3ea9c51cbeddd9a9222f932fb6f1bde to your computer and use it in GitHub Desktop.
Convert GDC JSON with files details to a TSV with the columns: case_id, file_id, file_name
import json
import pandas
import argparse
parser = argparse.ArgumentParser(description='Convert GDC JSON with files details to a TSV with the columns: case_id, file_id, file_name')
parser.add_argument('-j', '--json', metavar='JSON', type=str,
help='JSON file with file details from GDC')
parser.add_argument('-o', '--output', metavar='TSV',
default='file-dict.tsv',
help='Path to output file')
args = parser.parse_args()
print(args)
with open(args.json, 'r') as f:
j = json.load(f)
case_ids = [d['cases'][0]['case_id'] for d in j]
file_ids = [d['file_id'] for d in j]
file_names = [d['file_name'] for d in j]
df = pandas.DataFrame({'case_id': case_ids, 'file_id': file_ids, 'file_name': file_names})
df.to_csv(args.output, sep ='\t', index = False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment