Created
April 30, 2012 03:51
-
-
Save linse/2555349 to your computer and use it in GitHub Desktop.
Convert RNA secondary structure (in dot-bracket notation) to forest in pgf/tikz using Haskell
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Data for dot-bracket notation (a.k.a. Vienna notation) | |
> type Viennachar = Char | |
> type RNAchar = Char | |
Data for tree structure | |
> data Tree a = N a (Forest a) deriving (Show,Eq,Ord) | |
> type Forest a = [Tree a] | |
Parse dot-bracket notation and build an RNA forest | |
> build' :: [(Viennachar, RNAchar)] -> Forest RNAchar | |
> build' [] = [] | |
> build' [('.',a)] = [N a []] | |
> build' [(')',a)] = error $ "structure not well parenthesized" | |
> build' [('(',a)] = error $ "structure not well parenthesized" | |
> build' [( x ,a)] = error $ show x++" symbol not allowed in vienna notation" | |
> build' (('(',a1):(')',a2):ps) = (N 'P' (open:[close])) : build' ps | |
> where (open, close) = ((N a1 []), (N a2 [])) | |
> | |
> build' (('.',a):ps) = (N a []) : build' ps | |
> build' (('(',o):ps) = (N 'P' ((open:inbracketsrecursion)++[close])) : build' trest | |
> where (open, close) = ((N o []), (N (snd hrest) [])) | |
> inbracketsrecursion = build' inbrackets | |
> (hrest, trest) = (head rest, tail rest) | |
> (inbrackets,rest) = if (l==[]) then error "closing bracket missing" else (head l) | |
> l = [splitAt k ps| k<-[1..length(ps)], level(take (k+1) (map fst ps))<0 ] | |
> | |
> build' ((')',a):ps) = error "no closing bracket expected here" | |
> build' (( x ,a):ps) = error $ show x++" symbol not allowed in vienna notation" | |
Print an RNA forest as a list of lines in pgf/tikz tree format | |
> pprint :: [Tree RNAchar] -> [String] | |
> pprint ((N x (a:as)):bs) = ("\\begin{scope}[xshift=1em]"):("\\node {"++ [x] ++ "}"):(pprint' (a:as)) ++ ["; "] ++ pprint bs ++ ["\\end{scope}"] | |
> pprint ((N x []):bs) = ("\\begin{scope}[xshift=1em]"):("\\node {"++ [x] ++ "};") : pprint bs ++ ["\\end{scope}"] | |
> pprint [] = [] | |
> pprint' ((N x (a:as)):bs) = ("child { node {"++ [x] ++ "} "):(pprint' (a:as)) ++ ["} "] ++ pprint' bs | |
> pprint' ((N x []):bs) = ("child { node {"++ [x] ++ "} "):"} ": pprint' bs | |
> pprint' [] = [] | |
Testdata | |
> test0 = ("GGGGGGCCGCCGGGGAAAAA", "..((..)).((..)).....") | |
> test = ("GGGGGCGCCGGGG", "..(..).((..))") | |
Main function | |
> dotBracketToTikz s = putStr $ unlines $ pprint (build (snd s) (fst s)) | |
Example call in GHCi / hugs | |
dotBracketToTikz test0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment