Skip to content

Instantly share code, notes, and snippets.

View venkan's full-sized avatar

venkan

View GitHub Profile
@venkan
venkan / metadata_for_TCGA.r
Created May 18, 2018 09:14
TCGA metadata - To get the read length, to check whether all the samples are paired-end and to get all other information for TCGA samples
library(GenomicDataCommons)
q = files() %>%
filter(~ cases.project.project_id == 'TCGA-LIHC' &
data_type == 'Aligned Reads' &
experimental_strategy == 'WXS' &
data_format == 'BAM') %>% select('file_id') %>%
expand('analysis.metadata.read_groups')
file_ids = ids(q)
z = results_all(q)
read_length_list = sapply(z$analysis$metadata$read_groups,'[[','read_length')
rm(list=ls())
library(GenomicFeatures)
library(rtracklayer)
extractGenesOfGTF <- function(inputFile, outputGTF_File, outputCsvFile ) {
# reading the file.
gtf <- import(inputFile)
# its dimension is 2617197 by 25
gtf <- as.data.frame(gtf)