Based on the WordNet corpus, omitting results containing:
- slashes (e.g.
lo/ovral
) - spaces (e.g.
living_thing
) - digits (e.g.
1900s
)
Generated with the following Python script using NLTK:
from pathlib import Path
from nltk.corpus import wordnet
nouns = (n.name().split(".")[0] for n in wordnet.all_synsets("n"))
nouns = (n for n in nouns if all(c not in n for c in "/_0123456789"))
Path("nouns.txt").write_text("\n".join((*sorted(set(nouns)), "")))
All nouns can be matched with the [a-z'\-]+
regex.