BCL Files are base call and quality score binary files containing a (base,quality) pair for successive clusters. The file is structured as followed: Bytes 1-4 : unsigned int numClusters Bytes 5-numClusters + 5 : 1 byte base/quality score
The base/quality scores are organized as follows (with one exception, SEE BELOW): The right 2 most bits (these are the LEAST significant bits) indicate the base, where A=00(0x00), C=01(0x01), G=10(0x02), and T=11(0x03)
The remaining bytes compose the quality score which is an unsigned int.
EXCEPTION: If a byte is entirely 0 (e.g. byteRead == 0) then it is a no call, the base becomes '.' and the Quality becomes 2, the default illumina masking value
(E.g. if we get a value in binary of 10001011 it gets transformed as follows:
Value read: 10001011(0x8B)
Quality | Base |
---|---|
100010 | 11 |
00100010 | 0x03 |
0x22 | T |
34 | T |
So the output base/quality will be a (T/34)
I could not find the bcl file format full description. Is there an official document? Maybe can read the source code of bcl2fastq program which is still open source (last updated 2017). The newer bcl convert program is only binary-distributed.