Skip to content

Instantly share code, notes, and snippets.

@DenisVerkhoturov
Last active May 19, 2020 11:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DenisVerkhoturov/31792f8e8e4b8e7dae3c7e8b0ecb4e20 to your computer and use it in GitHub Desktop.
Save DenisVerkhoturov/31792f8e8e4b8e7dae3c7e8b0ecb4e20 to your computer and use it in GitHub Desktop.
PCRE (Perl Compatible Regular Expression) for protein level alteration notation. Based on http://www.hgmd.cf.ac.uk/docs/mut_nom.html#protein
(?x)
^
(?<prefix>p.)?
(?<alteration>
(?:
(?<substitution>
(?<substitution_old>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
(?<substitution_position>\d+)
(?<substitution_new>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
)
|
(?<deletion>
(?:
(?<single_deletion>
(?<deletion_nucleotide>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
(?<deletion_position>\d+)
)
|
(?<range_deletion>
(?<deletion_start_nucleotide>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
(?<deletion_start_position>\d+)
(?:_)
(?<deletion_end_nucleotide>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
(?<deletion_end_position>\d+)
)
)
(?:del)
)
|
(?<insersion>
(?<insersion_start_nucleotide>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
(?<insersion_start_position>\d+)
(?:_)
(?<insersion_end_nucleotide>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
(?<insersion_end_position>\d+)
(?:ins)
(?<insersion_sequence>(?:A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)+)
)
|
(?<indels>
(?:
(?<single_indels>
(?<indels_nucleotide>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
(?<indels_position>\d+)
)
|
(?<range_indels>
(?<indels_start_nucleotide>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
(?<indels_start_position>\d+)
(?:_)
(?<indels_end_nucleotide>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
(?<indels_end_position>\d+)
)
)
(?:delins)
(?<indels_sequence>(?:A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)+)
)
|
(?<duplications>
(?<duplications_start_nucleotide>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
(?<duplications_start_position>\d+)
(?:_)
(?<duplications_end_nucleotide>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
(?<duplications_end_position>\d+)
(?:dup)
)
|
(?<frame_shifting>
(?<frame_shifting_start_nucleotide>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
(?<frame_shifting_start_position>\d+)
(?:fs)
(?<frame_shifting_end_nucleotide>A|R|N|D|B|C|E|Q|Z|G|H|I|L|K|M|F|P|S|T|W|Y|V|X)
(?<frame_shifting_end_position>\d+)
)
)+
)
$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment