Skip to content

Instantly share code, notes, and snippets.

@jurgenvinju
Last active March 16, 2022 14:30
Show Gist options
  • Save jurgenvinju/3bf96627573ed54b5b48071034323f61 to your computer and use it in GitHub Desktop.
Save jurgenvinju/3bf96627573ed54b5b48071034323f61 to your computer and use it in GitHub Desktop.
Turn a block of hypothesis document annotations into a plaintext list of detailed comments for a peer review
module HypothesisToReview
import lang::html::IO;
import IO;
import List;
// this is a simple HTML scraper against an untyped HTML document model (`node`)
str detailedComments(loc hypothesisCards) {
content = readHTMLFile(annos);
review = "";
for (/card:"li"(_,class="annotation-card") := content) {
if (/"blockquote"(["text"(str quote)]) := card) {
if (/"div"(["p"(["text"(str comment)])],class="annotation-card__text") := card) {
review += "\> <quote>
' <comment>\n\n";
}
else {
review += "\> <quote> - highlighted\n\n";
}
}
else {
throw "no quote? <card>";
}
}
return review;
}
@jurgenvinju
Copy link
Author

jurgenvinju commented Mar 16, 2022

  • we use an untyped HTML model here, just because. We are matching against nodes like "li" and "div" that do not have an ADT declaration.
  • the deep match / operator allows us to skip all kinds of uninteresting structure and matching class attributes using "keyword field matching" allows us to anchor where we want to find information again.
  • += concatenates string templates lazily for maximal efficiency (it uses an internally balanced tree to avoid stack overflow)

@dwhly
Copy link

dwhly commented Mar 16, 2022

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment