Skip to content

Instantly share code, notes, and snippets.

@mosmeh
Created April 3, 2020 11:27
Show Gist options
  • Save mosmeh/414d7d8c6d603245ef26ca0a110680ac to your computer and use it in GitHub Desktop.
Save mosmeh/414d7d8c6d603245ef26ca0a110680ac to your computer and use it in GitHub Desktop.
塩基配列を読むときのテクニック

ASCIIでは

  • A,C,G,Tの値は0x61,0x63,0x67,0x74なので,下位4bitを取り出すと1,3,7,4になり,すべて異なる値となる
  • 大文字と小文字の値は0x20だけ違うため,下位4bitを取り出すと大文字と小文字は同じ値となる

→ 下位4bitだけ読むことでA,C,G,Tをcase-insensitiveに読める

int read_base(char c) {
    switch (c & 0xf) {
        case 'A' & 0xf: // 1
            return 0;
        case 'C' & 0xf: // 3
            return 1;
        case 'G' & 0xf: // 7
            return 2;
        case 'T' & 0xf: // 4
            return 3;
    }
    return 0;
}

or

static const int conv[16] = {
    ['A' & 0xf] = 0, ['C' & 0xf] = 1, ['G' & 0xf] = 2, ['T' & 0xf] = 3
};

int read_base(char c) {
    return conv[c & 0xf];
}

オリジナル: https://github.com/ocxtal/minialign/blob/5fd40a595488b15194f6e3c53fd83ddfb84fa4fd/minialign.c#L223-L227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment