Skip to content

Instantly share code, notes, and snippets.

@raonyguimaraes
Forked from knmkr/tabix-s3-example.md
Created February 20, 2021 08:37
Show Gist options
  • Save raonyguimaraes/ebf3346125e1d0a02a65d1b73e14b2cc to your computer and use it in GitHub Desktop.
Save raonyguimaraes/ebf3346125e1d0a02a65d1b73e14b2cc to your computer and use it in GitHub Desktop.
tabix s3 example

Tabix command of htslib can query a locus to a remote s3 file using s3:// protocol.

$ aws s3 ls s3://your_bucket/
vcf.gz
vcf.gz.tbi

$ tabix -l s3://your_bucket/vcf.gz
chr1
chr2
chr3

But it is not enabled by default. To enable it, we need to compile it with --enable-libcurl option which enables variety of network protcols.

$ less htslib/INSTALL
...
--enable-libcurl
    Use libcurl (<http://curl.haxx.se/>) to implement network access to
    remote files via FTP, HTTP, HTTPS, etc.  By default, HTSlib uses its
    own simple networking code to provide access via FTP and HTTP only.
...
--enable-s3
    Implement network access to Amazon AWS S3.  By default or with
    --enable-s3=check, this is enabled when libcurl is enabled.
...

Install

As of writing, the latest version 1.9 doesn't fully support s3 but the develop branch includes a fix for it. So,

  1. Fetch recent develop branch
$ git clone --shallow-since 2019-07-01 https://github.com/samtools/htslib.git
  1. Install dependencies listed in INSTALL.
$ cd htslib
$ less INSTALL
...
RedHat / CentOS
---------------

sudo yum install autoconf automake make gcc perl-Data-Dumper zlib-devel bzip2 bzip2-devel xz-devel curl-devel openssl-devel
...

$ sudo yum install autoconf automake make gcc perl-Data-Dumper zlib-devel bzip2 bzip2-devel xz-devel curl-devel openssl-devel
  1. Then, compile it with --enable-libcurl option which enables s3:// protocol.
$ autoheader
$ autoconf
$ ./configure --enable-libcurl
$ make
$ sudo make install

Example

Now tabix can query a remote s3 file without downloading it. Make sure you have both bgzip-ed vcf (s3://your_bucket/vcf.gz) and its tabix index file (s3://your_bucket/vcf.gz.tbi) in the same s3 location. As of writing, AWS's instance profile was not supported, so set AWS credentials by environmental variables or ~/.aws/credentials.

$ export AWS_ACCESS_KEY_ID=XXX AWS_SECRET_ACCESS_KEY=XXX AWS_DEFAULT_REGION=us-west-2
$ tabix -l s3://your_bucket/vcf.gz
chr1
chr2
chr3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment