Skip to content

Instantly share code, notes, and snippets.

@mcsf
Created September 3, 2020 15:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mcsf/b244662b89365961ddfd38aecf8b7526 to your computer and use it in GitHub Desktop.
Save mcsf/b244662b89365961ddfd38aecf8b7526 to your computer and use it in GitHub Desktop.
# Invoke with
#
# awk -f script.awk -v term=li blocktypes.tsv
#
# With blocktypes.tsv:
#
# list List ul,ol
# revue Revue list
# paragraph Paragraph text
# image Image image,picture
# gallery Gallery image,images
# Different AWK implementations provide different regular expressions
# for word boundaries ("\<"), so do this manually.
function open_match(str, pat) {
return str ~ "(^|[^A-z])"pat
}
($1 == term || $2 == term) {
exact_name[NR] = $1
}
($3 ~ term) {
split($3, words, ",")
for (i in words) {
if (words[i] == term) {
exact_keyw[NR] = $1
break
}
}
}
(open_match($1, term) || open_match($2, term)) {
partial_name[NR] = $1
}
open_match($3, term) {
partial_keyw[NR] = $1
}
END {
for (i in exact_name) {
print "(exact-name-match) ", exact_name[i]
}
for (i in exact_keyw) {
if (i in exact_name) {
continue
}
print "(exact-keyword-match) ", exact_keyw[i]
}
for (i in partial_name) {
if (i in exact_name || i in exact_keyw) {
continue
}
print "(partial-name-match) ", partial_name[i]
}
for (i in partial_keyw) {
if (i in exact_name || i in exact_keyw || i in partial_name) {
continue
}
print "(partial-keyword-match) ", partial_keyw[i]
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment