This gist shows you how to create a BED file in BED12 format containing every protein-coding NCBI RefSeq Select gene with the exons annotated as blocks in the BED file.
-
First, download a TSV file of NCBI RefSeq Select genes
- Go to the UCSC Table Browser
- Select these parameters
- Assembly: Dec. 2013 (GRCh38/hg38)
- Group: Genes and Gene Predictions
- Track: NCBI RefSeq
- Table: RefSeq Select and MANE (ncbiRefSeqSelect)
- Output format: all fields from selected table
- Output filename: GRCh38.ncbiRefSeqSelect.tsv.gz
- File type returned: gzip compressed
- Click "get output"
-
Use the awk script below to process the TSV into BED12 format.
zcat GRCh38.ncbiRefSeqSelect.tsv.gz \ | tail -n +2 \ | awk -f ncbi_refseq_tsv_to_exon_bed12.awk - \ > GRCh38.ncbiRefSeqSelect.genes.bed