Skip to content

Instantly share code, notes, and snippets.

@donbrae
Last active April 8, 2024 10:58
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save donbrae/debf185733a0281bcc0a9e65330b3b57 to your computer and use it in GitHub Desktop.
Save donbrae/debf185733a0281bcc0a9e65330b3b57 to your computer and use it in GitHub Desktop.
Some regular expressions for finding patterns in text.
  • Find all references to filenames ending in .js: [^ ]+\.js
  • Find all blank lines: ^\r?\n\r?
  • Wildcard (any character and newlines): (.|\n)+?
    • Eg find all <td>s, regardless of content: <td>(.|\n)+?</td>
      • Include attribute(s): <td(.|\n)+?>(.|\n)+?</td>
  • Convert getComputedStyle(foot1).left to foot1.getBoundingClientRect().x: find getComputedStyle\((.+?)\).left and replace it with $1.getBoundingClientRect().x
  • Remove all <a> tags, leaving just the anchor tag text: find <a[^>]*>(.*?)</a> and replace with $1
    • Only links that start with the string /foo: <a href="/foo[^>]*>(.*?)</a>
  • Find all URLs: \b(https?):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]
    • All URLs ending in .jpg: \b(https?):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|](\.jpg)
    • All URLs within href attributes: (href=["'])(\b[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]). URL available in $2
  • Return lines where whole-word strings foo or bar baz occur in the line: ^.*\b(foo|bar baz)\b.*\n? RegExr →
    • Or where foo or bar baz strings occur anywhere in the line: ^.*(foo|bar baz).*\n? RegExr →
  • Match every line in a simple CSV file where the line contains the string foo in the second column, but not if the string foo also appears in another column (requires multiline and global flags): ^(?:(?!foo)[^,]*,){1}foo(?:(?:,(?!foo)[^,]*))*$ RegExr →
  • Match URL where the domain is not jazzkeys.fyi: ^(?!https?:\/\/(.*\.)?jazzkeys\.fyi)[^ ]+$
  • Match .ac, .edu and .gov domains only: (\\.ac\\.[a-zA-Z]{2}?$)|(\\.edu(\\.[a-zA-Z]{2})?$)|(\\.?gov(\\.[a-zA-Z]{1,4})?$)
    • Will match domains like unimelb.edu.au, blogs.gov.scot and gla.ac.uk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment