dbolser/gist:51aeda3c780f83b6fa84

## gistfile1.txt

SYNOPSIS
       bgzip [-cdhB] [-b virtualOffset] [-s size] [file]

       tabix  [-0lf]  [-p gff|bed|sam|vcf] [-s seqCol] [-b begCol] [-e endCol]
       [-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]]

DESCRIPTION
       Tabix indexes a TAB-delimited genome position file in.tab.bgz and  cre‐
       ates  an  index file in.tab.bgz.tbi when region is absent from the com‐
       mand-line. The input data file must be position sorted  and  compressed
       by  bgzip  which has a gzip(1) like interface. After indexing, tabix is
       able to quickly retrieve data lines overlapping  regions  specified  in
       the  format  "chr:beginPos-endPos". Fast data retrieval also works over
       network if URI is given as a file name and in this case the index  file
       will be downloaded if it is not present locally.

OPTIONS OF TABIX
       -p STR    Input  format  for indexing. Valid values are: gff, bed, sam,
                 vcf and psltab. This option should not  be  applied  together
                 with  any  of  -s, -b, -e, -c and -0; it is not used for data
                 retrieval because this setting is stored in the  index  file.
                 [gff]

       -s INT    Column of sequence name. Option -s, -b, -e, -S, -c and -0 are
                 all stored in the index  file  and  thus  not  used  in  data
                 retrieval. [1]

       -b INT    Column of start chromosomal position. [4]

       -e INT    Column of end chromosomal position. The end column can be the
                 same as the start column. [5]

       -S INT    Skip first INT lines in the data file. [0]

       -c CHAR   Skip lines started with character CHAR. [#]

       -0        Specify that the position in the data file is  0-based  (e.g.
                 UCSC files) rather than 1-based.

       -h        Print the header/meta lines.

       -B        The  second  argument  is  a BED file. When this option is in
                 use, the input file may not be sorted or indexed. The  entire
                 input  will  be  read  sequentially.  Nonetheless,  with this
                 option, the format of the input must be specificed  correctly
                 on the command line.

       -f        Force to overwrite the index file if it is present.

       -l        List the sequence names stored in the index file.

EXAMPLE
       (grep  ^"#"  in.gff; grep -v ^"#" in.gff | sort -k1,1 -k4,4n) | bgzip >
       sorted.gff.gz;

       tabix -p gff sorted.gff.gz;

       tabix sorted.gff.gz chr1:10,000,000-20,000,000;

NOTES
       It is straightforward to achieve overlap queries using the standard  B-
       tree  index (with or without binning) implemented in all SQL databases,
       or the R-tree index in PostgreSQL and Oracle. But there are still  many
       reasons  to  use  tabix.  Firstly,  tabix  directly works with a lot of
       widely used TAB-delimited formats such as GFF/GTF and BED.  We  do  not
       need  to  design database schema or specialized binary formats. Data do
       not need to be duplicated in different formats, either. Secondly, tabix
       works  on  compressed  data  files while most SQL databases do not. The
       GenCode annotation GTF can be compressed down to 4%.  Thirdly, tabix is
       fast.  The  same indexing algorithm is known to work efficiently for an
       alignment with a few billion short reads. SQL databases probably cannot
       easily  handle  data  at this scale. Last but not the least, tabix sup‐
       ports remote data retrieval. One can put the data file and the index at
       an  FTP  or  HTTP  server, and other users or even web services will be
       able to get a slice without downloading the entire file.

AUTHOR
       Tabix was written by Heng Li. The BGZF library  was  originally  imple‐
       mented  by Bob Handsaker and modified by Heng Li for remote file access
       and in-memory caching.

SEE ALSO
       samtools(1)

tabix-0.2.0                       11 May 2010                         tabix(1)

	SYNOPSIS
	bgzip [-cdhB] [-b virtualOffset] [-s size] [file]

	tabix [-0lf] [-p gff\|bed\|sam\|vcf] [-s seqCol] [-b begCol] [-e endCol]
	[-S lineSkip] [-c metaChar] in.tab.bgz [region1 [region2 [...]]]

	DESCRIPTION
	Tabix indexes a TAB-delimited genome position file in.tab.bgz and cre‐
	ates an index file in.tab.bgz.tbi when region is absent from the com‐
	mand-line. The input data file must be position sorted and compressed
	by bgzip which has a gzip(1) like interface. After indexing, tabix is
	able to quickly retrieve data lines overlapping regions specified in
	the format "chr:beginPos-endPos". Fast data retrieval also works over
	network if URI is given as a file name and in this case the index file
	will be downloaded if it is not present locally.

	OPTIONS OF TABIX
	-p STR Input format for indexing. Valid values are: gff, bed, sam,
	vcf and psltab. This option should not be applied together
	with any of -s, -b, -e, -c and -0; it is not used for data
	retrieval because this setting is stored in the index file.
	[gff]

	-s INT Column of sequence name. Option -s, -b, -e, -S, -c and -0 are
	all stored in the index file and thus not used in data
	retrieval. [1]

	-b INT Column of start chromosomal position. [4]

	-e INT Column of end chromosomal position. The end column can be the
	same as the start column. [5]

	-S INT Skip first INT lines in the data file. [0]

	-c CHAR Skip lines started with character CHAR. [#]

	-0 Specify that the position in the data file is 0-based (e.g.
	UCSC files) rather than 1-based.

	-h Print the header/meta lines.

	-B The second argument is a BED file. When this option is in
	use, the input file may not be sorted or indexed. The entire
	input will be read sequentially. Nonetheless, with this
	option, the format of the input must be specificed correctly
	on the command line.

	-f Force to overwrite the index file if it is present.

	-l List the sequence names stored in the index file.

	EXAMPLE
	(grep ^"#" in.gff; grep -v ^"#" in.gff \| sort -k1,1 -k4,4n) \| bgzip >
	sorted.gff.gz;

	tabix -p gff sorted.gff.gz;

	tabix sorted.gff.gz chr1:10,000,000-20,000,000;

	NOTES
	It is straightforward to achieve overlap queries using the standard B-
	tree index (with or without binning) implemented in all SQL databases,
	or the R-tree index in PostgreSQL and Oracle. But there are still many
	reasons to use tabix. Firstly, tabix directly works with a lot of
	widely used TAB-delimited formats such as GFF/GTF and BED. We do not
	need to design database schema or specialized binary formats. Data do
	not need to be duplicated in different formats, either. Secondly, tabix
	works on compressed data files while most SQL databases do not. The
	GenCode annotation GTF can be compressed down to 4%. Thirdly, tabix is
	fast. The same indexing algorithm is known to work efficiently for an
	alignment with a few billion short reads. SQL databases probably cannot
	easily handle data at this scale. Last but not the least, tabix sup‐
	ports remote data retrieval. One can put the data file and the index at
	an FTP or HTTP server, and other users or even web services will be
	able to get a slice without downloading the entire file.

	AUTHOR
	Tabix was written by Heng Li. The BGZF library was originally imple‐
	mented by Bob Handsaker and modified by Heng Li for remote file access
	and in-memory caching.

	SEE ALSO
	samtools(1)

	tabix-0.2.0 11 May 2010 tabix(1)