Skip to content

Instantly share code, notes, and snippets.

@mlist
Last active June 2, 2021 15:14
Show Gist options
  • Save mlist/1bb4f411508e401b9eba537f592128f0 to your computer and use it in GitHub Desktop.
Save mlist/1bb4f411508e401b9eba537f592128f0 to your computer and use it in GitHub Desktop.
converting ENSEMBL regulatory build GFF file to a regular tab delimited file
#obtained from ftp://ftp.ensembl.org/pub/grch37/release-87/regulation/homo_sapiens/homo_sapiens.GRCh37.Regulatory_Build.regulatory_features.20161117.gff.gz
#date: 12/01/2017
library(stringr)
library(tidyr)
homo_sapiens.GRCh37.Regulatory_Build.regulatory_features.20161117 <- read.delim("/local/home/mlist/Projects/homo_sapiens.GRCh37.Regulatory_Build.regulatory_features.20161117.gff", header=FALSE)
ensembl_reg_hg19 <- tidyr::separate(homo_sapiens.GRCh37.Regulatory_Build.regulatory_features.20161117, col = V9, into = c("ID", "bound_end", "bound_start", "description", "feature_type"), sep = ";")
apply(ensembl_reg_hg19, 2, function(x) str_replace_all(x, pattern = ".*=", ""))
ensembl_reg_hg19_tidy <- apply(ensembl_reg_hg19, 2, function(x) str_replace_all(x, pattern = ".*=", ""))
ensembl_reg_hg19_tidy <- ensembl_reg_hg19_tidy[, c("V1", "V4", "V5", "ID", "feature_type")]
colnames(ensembl_reg_hg19_tidy) <- c("CHROMOSOME", "START", "END", "ID", "FEATURE")
write.table(ensembl_reg_hg19_tidy, quote=FALSE, sep = "\t",
row.names = FALSE,
file = "ENSEMBL_regulatory_build_hs37d5.txt")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment