Skip to content

Instantly share code, notes, and snippets.

@fbrundu
Created March 5, 2015 14:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fbrundu/7e042d76501fbc8c310b to your computer and use it in GitHub Desktop.
Save fbrundu/7e042d76501fbc8c310b to your computer and use it in GitHub Desktop.
Correct a TCGA assembled tsv file (tab delimited), formatting sample names for tsv columns
import pandas as pd
import sys
import re
tcga_tsv = sys.argv[1]
tcga = pd.read_table(tcga_tsv, sep='\t', index_col=0)
oldcolumns = tcga.columns.tolist()
newcolumns = ['-'.join(re.findall(r'TCGA[^_]*', oc)[0].split('-')[:4])
for oc in oldcolumns]
tcga.columns = newcolumns
tcga.to_csv(tcga_tsv, sep='\t')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment