Skip to content

Instantly share code, notes, and snippets.

@O1O1O1O
Last active May 4, 2018 05:37
Show Gist options
  • Save O1O1O1O/4a7b5f2d8b28df8a703570248ec83b68 to your computer and use it in GitHub Desktop.
Save O1O1O1O/4a7b5f2d8b28df8a703570248ec83b68 to your computer and use it in GitHub Desktop.
regex to find possible entity names in a sentence

Here is a regular expression I created to identify Capitalized and ALLCAPS alphanumeric words that are not at the start of a sentence which may indicate they are proper names aka a pronoun. It uses a lookbehind ?<= to avoid matching words at the start of a sentence.

(?<=[\w,:;-])\s+([A-Z0-9]+[A-Za-z][A-Za-z0-9]*)\b

See it working at the handy website Regex101 - https://regex101.com/r/5haqtp/2

Regex101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment