Skip to content

Instantly share code, notes, and snippets.

@maasha
Last active December 23, 2015 08:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save maasha/6606610 to your computer and use it in GitHub Desktop.
Save maasha/6606610 to your computer and use it in GitHub Desktop.
Hamming distance calculator taking into account IUPAC ambiguity codes for comparing two nucleotide sequenced the snappy way.
#!/usr/bin/env ruby
require 'narray'
# http://en.wikipedia.org/wiki/Nucleic_acid_notation
nuc_str = "ACGTUWSMKRYBDHVNacgtuwsmkrybdhvn"
bin_str = "\x08\x04\x02\x01\x01\x09\x06\x0c\x03\x0a\x05\x07\x0b\x0d\x0e\x0f\x08\x04\x02\x01\x01\x09\x06\x0c\x03\x0a\x05\x07\x0b\x0d\x0e\x0f"
str1 = "ATCGatcg"
str2 = "ATCnatcg"
bin1 = str1.tr(nuc_str, bin_str)
bin2 = str2.tr(nuc_str, bin_str)
na1 = NArray.to_na(bin1, 'byte')
na2 = NArray.to_na(bin2, 'byte')
puts (na1 & na2).eq(0).sum
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment