Skip to content

Instantly share code, notes, and snippets.

@acvill
Created August 16, 2023 16:35
Show Gist options
  • Save acvill/6ff28b24196d0c1135806d11ef1a8a14 to your computer and use it in GitHub Desktop.
Save acvill/6ff28b24196d0c1135806d11ef1a8a14 to your computer and use it in GitHub Desktop.
Given a DNA string with ambiguous IUPAC characters, print all the possible unambiguous DNA strings
make_unambiguous <- function(dna) {
require(tidyverse)
require(S4Vectors)
iupac <- tibble(code = c("A", "C", "G", "T",
"R", "Y", "S", "W",
"K", "M", "B", "D",
"H", "V", "N"),
base = c("A", "C", "G", "T",
"AG", "CT", "GC", "AT",
"GT", "AC", "CGT", "AGT",
"ACT", "ACG", "ACGT"))
tibble(dna) |>
separate_rows(dna, sep = '(?<=.)(?=.)') |>
left_join(iupac, by = c("dna" = "code")) |>
pull(base) |>
str_split("") |>
expand.grid(stringsAsFactors = FALSE) |>
unite(col = sequence, sep = "") |>
as_vector()
}
make_unambiguous(dna = "AAYGANAGYCARAGYAAR")
@acvill
Copy link
Author

acvill commented Aug 16, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment