Skip to content

Instantly share code, notes, and snippets.

@TTTPOB
Created January 14, 2022 02:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save TTTPOB/37c2c33fb65331e5b86185101878761d to your computer and use it in GitHub Desktop.
Save TTTPOB/37c2c33fb65331e5b86185101878761d to your computer and use it in GitHub Desktop.
get promoter by extend from gencode gtf file
#!/usr/bin/env Rscript
docstring <- "Usage: ./get_promoter_by_extend.r <encode_gtf> <extend> <output>"
args <- docopt::docopt(docstring)
args$extend <- as.numeric(args$extend)
print(args)
suppressPackageStartupMessages({
library(tidyverse)
library(magrittr)
library(rtracklayer)
})
# args <- list(
# encode_gtf = "gencode.vM23.chr_patch_hapl_scaff.annotation.gtf",
# extend = 2000,
# output = "promoter_2000.bed"
# )
cli::cli_alert_info("reading gtf")
gtf <- rtracklayer::import.gff(args$encode_gtf)
cli::cli_alert_success("read gtf done")
gene_df <- as_tibble(gtf) %>%
filter(type == "gene") %>%
select(c(seqnames, start, end, gene_name, score, strand)) %>%
mutate(score = 0)
tss_df <- gene_df %>%
mutate(end = start + 1)
promoter_df <- tss_df %>%
mutate(start = start - args$extend, end = end + args$extend)
# filter mainchr
promoter_df <- promoter_df %>%
filter(seqnames %>% str_detect("^chr(\\d{1,2}|X|Y)$"))
promoter_df %>% write_tsv(
args$output,
col_names = FALSE
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment