Skip to content

Instantly share code, notes, and snippets.

@maximvl
Created July 22, 2017 10:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save maximvl/c6335b52ac3a4ee1d780afbf3da636c4 to your computer and use it in GitHub Desktop.
Save maximvl/c6335b52ac3a4ee1d780afbf3da636c4 to your computer and use it in GitHub Desktop.
toy html parser
Red []
{
grammar HTML
document <- (doctype / text / tag)*
tag <- open_tag (text / tag)* close_tag
open_tag <- "<" [0-9a-zA-Z \"'=-]+ ">"
close_tag <- "</" [0-9a-zA-Z]+ ">"
doctype <- "<!DOCTYPE " [0-9a-zA-Z]+ ">"
text <- [^<]+
}
ws: charset reduce [newline space tab]
digits: charset {0123456789}
chars: union charset [#"a" - #"z"] charset [#"A" - #"Z"]
alphanum: union digits chars
alphanum-with-specials: union ws union alphanum charset {"'=-}
tags-stack: copy []
handle-open-tag: func [name] [
append tags-stack name
;print ["open" name]
print tags-stack
]
handle-close-tag: func [name] [
take/last tags-stack
;print ["close" name]
print tags-stack
]
document: [any [ahead "<" [ tag | doctype ] | text]]
tag: [whitespace open-tag any [ahead not "<" text | tag] close-tag]
open-tag: ["<" copy name tag-name (handle-open-tag name) any tag-parameter ">"]
tag-name: [some alphanum]
tag-parameter: [whitespace some alphanum opt ["=" "^"" some [not "^"" skip] "^""] ]
close-tag: ["</" copy name tag-name (handle-close-tag name) ">"]
doctype: ["<!DOCTYPE " some alphanum ">"]
text: [any [not "<" skip]]
whitespace: [any ws]
html: {
<html>
<body>
<img src="picture1.jpg" alt="<title>"></img>тут точно не тайтл<img src="picture2.jpg" alt="</title>"></img>
<img src="picture1.jpg" alt="<u>"></img>тут точно не подчеркнуто<img src="picture2.jpg" alt="</u>"></img>
<u>а тут подчеркнуто</u>
</body>
</html>
}
probe parse html document
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment