Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
How to run GISTIC2 for copy number segments based on Canine CanFam3.1 genome

GISTIC2 for Canine CanFam3.1 genome


Please visit https://github.com/sbamin/canine_gistic2 for updated code and documentation. For questions, please create an issue at https://github.com/sbamin/canine_gistic2/issues

Thanks!


Changes to original code by

Questions, comments:

  • You can comment for this gist but I may not get notifications. Best if you can tag @sbamin (hopefully, it can email me then) or better to contact me

For canine (canFam3.1), gistic2 run needs a few changes. You will need matlab to change code and recompile top level executable module or script, gp_gistic2_from_seg. I am using v2.0.22 from ftp://ftp.broadinstitute.org/pub/GISTIC2.0/

Recompile gp_gistic2_from_seg

Following are a few changes I made after getting input from a colleague, Emmanuel Martinez.

  • Make a refgene.file in .mat format containing three objects You can check valid format by opening hg19.mat in matlab.
  1. cyto (1x40): containing cytoband. Since canFam has basic cytoband, it will not have detailed (1p11.1, 1p22, etc.) band info as with hg19.mat. canine cytoband can be fetched from UCSC table browser or ftp. https://link.sbamin.com/2ztpbWM or http://hgdownload.soe.ucsc.edu/goldenPath/canFam3/database/cytoBandIdeo.txt.gz

  2. rg (1x<number_of_genes>): gene metadata based on canFam refGene/Ensemble gtf file

  3. rg_info (1x1): is additional metadata

Save these three objects as canfam3.1.mat

  • Now, extract GITIC2 v2.0.22 tarball. Structure would be something similar as noted here: ftp://ftp.broadinstitute.org/pub/GISTIC2.0/README.txt Now, we edit source/RefGeneInfo.m file, line 15 to change nchr from 24 to 40, and at line 29, add corresponding canine chromosomes c(25:38, 'MT', 'X').
RGI.chr.symb = {'1','2','3','4','5','6','7','8','9','10',...
                '11','12','13','14','15','16','17','18',...
                '19','20','21','22','23','24','25','26',...
                '27','28','29','30','31','32','33','34',...
                '35','36','37','38','MT','X'};

Please make sure to have chr sequence order identical to one in cyto object, which was specified under canfam3.1.mat.

  • Also, edit line 42 to 44 to match number of chromosomes: This may vary whether or not you like to add/remove autosomes, sex and MT chr.
RGI.txt2num('39') = 39;
RGI.txt2num('40') = 40;
RGI.chr.autosomal = (1:40)<39;
  • That's it! Recompile gp_gistic2_from_seg module which is the main executable under gistic2 shell wrapper.
  • Here is guide to package matlab application. For the first step under an option, add main file, provide path to gp_gistic2_from_seg.m, add relevant details, keep LICENSE unchanged, i.e., it is copyrighted software by Mermel et al., Broad Institute, Cambridge, MA, USA.

Code availability

I will push gistic2 code for canine sometime during end of 11/2018 at https://github.com/sbamin We need to fix some of canfam3.1.mat issues to better annotate gistic plots. Feel free to comment/edit here for improvements/bugs.

Segment file

  • I use following script to generate segment file. It is well-defined at ftp://ftp.broadinstitute.org/pub/GISTIC2.0/
## read each segment file, convert non-integer chromosome names to integers.

## create marker column based on the original window size used for making mappability bigwig file, i.e.,
## generateMap.pl -w 100 -i bowtie_index/CanFam3_1.fa "${REF_FASTA}" -o bigwigs/CanFam3_1.map.ws100bp.bw

convert log2 cn to log2-1 cn for GISTIC, so as log2(2) -1 == 0

Note: HMMcopy derived values are already log2 copy ratio, i.e., log2(T/N) = log2(T) - log2(N) = log2(T) - 1. So, it does not require substraction by 1, as I wrongly stated earlier. Read more on format at ftp://ftp.broadinstitute.org/pub/GISTIC2.0/GISTICDocumentation_standalone.htm and https://www.biostars.org/p/174382/#175590

gistic2_seg_file = dplyr::bind_rows(lapply(1:length(segfiles), function(i) {
    aux_data <- read_tsv(segfiles[i], col_types = "cddid") %>%
                filter(!chr == "MT") %>%
                mutate(chri = match(chr, info_chr$X1),
                       start = as.integer(start),
                       end = as.integer(end),
                       markers = as.integer((end - start + 1)/100),
                       segvalue = median,
                       Sample = sampleids[i]) %>%
                dplyr::arrange(chri) %>%
                dplyr::select(one_of(c("Sample", "chr", "start", "end", 
                                "markers", "segvalue")))
    return(aux_data)
    }))

head(gistic2_seg_file)
colnames(gistic2_seg_file) <- c("Sample", "Chromosome", "Start Position", "End Position", "Num markers", "Seg.CN")

write_tsv(gistic2_seg_file, path = sprintf("%s/%s.tsv", getwd(), outfile))

Marker file

  • marker file is not required since GISTIC v2.023 but we are using v2.022
  • Following a post by Ming Tang to get marker file.
## marker file:

sed '1d' gistic2_segments.tsv | cut -f2,3 > markers.txt
sed '1d' gistic2_segments.tsv | cut -f2,4 >> markers.txt

## sort the files by chromosome, take the unique ones and number the markers.

cat markers.txt | sort -V -k1,1 -k2,2nr | uniq | nl > markers_gistic.txt
rm markers.txt

Example run:

## load MCR v8.0
module load rvMCR/8.0

## export GISTIC2 binaries into PATH
export PATH="${PATH}:${GISTICDIR}"

## set vars
BASEDIR="/fastscratch/amins/flowr/getsegs"
SEGFILE="${BASEDIR}/gistic2_segments.tsv"
MARKERS="${BASEDIR}/markers_gistic.txt"
CANINEREF="/projects/verhaak-lab/DogWGSReference"
REFGENE="${CANINEREF}/copy_number/gistic/canfam3.1.mat"

# export GISTIC2 in PATH
GITDIR="/projects/verhaak-lab/verhaak_env/verhaak_apps/gistic"
export PATH="${PATH}:${GITDIR}/cgp_gistic2"

## run with case1 seg file
OUTDIR1="${BASEDIR}/segmode1"
mkdir -p "${OUTDIR1}"
SEGFILE1="${BASEDIR}/inputs/gistic2_segment_mode1.tsv"
MARKERS1="${BASEDIR}/inputs/markers_gistic_mode1.txt"

gp_gistic2_from_seg_upd -b "${OUTDIR1}" -seg "${SEGFILE1}" -mk "${MARKERS1}" -refgene "${REFGENE}" -maxseg 25000 -savegene 1 -genegistic 1

echo "SegMode1: GISTIC2 mode ran with exit code $?"

Samir

@sagarutturkar

This comment has been minimized.

Copy link

sagarutturkar commented Jul 24, 2018

Thank you for this excellent documentation. I have few questions regarding creation of reference file in MATLAB.

  1. While creation of canfam3.1.mat file, is it necessary to have all the fields as well as field-names exactly as shown for the hg19.mat or it works with few mandatory fields and custom names?
  2. canine cytoband file from UCSC is 0-based (i.e. start is 0) and I am using canine reference genome from ENSEMBL. Do you think that the start coordinate should be updated to 1 for each ENSEMBL chromosome?
@sbamin

This comment has been minimized.

Copy link
Owner Author

sbamin commented Aug 3, 2018

I am so sorry for delayed reply here. Somehow I am not getting any notifications for comments on gists. For 1. I guess now you have that and credit goes to @jemartinezledes for making a gistic compatible canine reference. It still has some room for improvement using updated cytoband and updated ensembl gene anntotations, if available.

For 2. I have not checked that and can confirm it later after checking code.

@kdrenner

This comment has been minimized.

Copy link

kdrenner commented Jan 14, 2019

Hello,

@sbamin Thank you for documenting how to make GISTIC 2 work for canine. I was wondering if its possible to post your current canfam3.1.mat file?

@sadafambreen

This comment has been minimized.

Copy link

sadafambreen commented Apr 29, 2019

Hi, @sbamin can you please guide me, how i can recompile gp_gistic2_from_seg? i have made all changes but it is not working.

@sbamin

This comment has been minimized.

Copy link
Owner Author

sbamin commented Apr 30, 2019

Hello @sadafambreen,

Please look into https://github.com/sbamin/canine_gistic2 for related matlab version of canfam3.1 and documentation. Please note that we are using GISTIC v2.0.22 from ftp://ftp.broadinstitute.org/pub/GISTIC2.0/ and matlab MCR/8.0. This may not work with other gistic versions and/or matlab compilers.

Moving forward, please post an issue at https://github.com/sbamin/canine_gistic2/issues for questions.

Cheers,
Samir

@ellen327

This comment has been minimized.

Copy link

ellen327 commented Jul 2, 2019

Hi @sbamin, I have recompiled gp_gistic2_from_seg and it worked perfectly fine using human reference, but always giving an error with canine reference. Would you think it is a recompilation issue or other issues? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.