Skip to content

Instantly share code, notes, and snippets.

Forked from knmkr/
Created February 20, 2021 08:37
Show Gist options
  • Save raonyguimaraes/ebf3346125e1d0a02a65d1b73e14b2cc to your computer and use it in GitHub Desktop.
Save raonyguimaraes/ebf3346125e1d0a02a65d1b73e14b2cc to your computer and use it in GitHub Desktop.
tabix s3 example

Tabix command of htslib can query a locus to a remote s3 file using s3:// protocol.

$ aws s3 ls s3://your_bucket/

$ tabix -l s3://your_bucket/vcf.gz

But it is not enabled by default. To enable it, we need to compile it with --enable-libcurl option which enables variety of network protcols.

$ less htslib/INSTALL
    Use libcurl (<>) to implement network access to
    remote files via FTP, HTTP, HTTPS, etc.  By default, HTSlib uses its
    own simple networking code to provide access via FTP and HTTP only.
    Implement network access to Amazon AWS S3.  By default or with
    --enable-s3=check, this is enabled when libcurl is enabled.


As of writing, the latest version 1.9 doesn't fully support s3 but the develop branch includes a fix for it. So,

  1. Fetch recent develop branch
$ git clone --shallow-since 2019-07-01
  1. Install dependencies listed in INSTALL.
$ cd htslib
$ less INSTALL
RedHat / CentOS

sudo yum install autoconf automake make gcc perl-Data-Dumper zlib-devel bzip2 bzip2-devel xz-devel curl-devel openssl-devel

$ sudo yum install autoconf automake make gcc perl-Data-Dumper zlib-devel bzip2 bzip2-devel xz-devel curl-devel openssl-devel
  1. Then, compile it with --enable-libcurl option which enables s3:// protocol.
$ autoheader
$ autoconf
$ ./configure --enable-libcurl
$ make
$ sudo make install


Now tabix can query a remote s3 file without downloading it. Make sure you have both bgzip-ed vcf (s3://your_bucket/vcf.gz) and its tabix index file (s3://your_bucket/vcf.gz.tbi) in the same s3 location. As of writing, AWS's instance profile was not supported, so set AWS credentials by environmental variables or ~/.aws/credentials.

$ tabix -l s3://your_bucket/vcf.gz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment