Skip to content

Instantly share code, notes, and snippets.

@IamNaN
Created November 27, 2012 23:12
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save IamNaN/4157864 to your computer and use it in GitHub Desktop.
Save IamNaN/4157864 to your computer and use it in GitHub Desktop.
Regex to parse HTML attribute/value pairs.
This regex parses attributes from their values such as those in HTML elements. It returns the attribute
names and their values even when the quotes are escaped, nested, or omitted.
The following are examples attribute/value pairs that are properly divided:
a="a" b="b b" c='c' d=1 e="escaped \" quotes" f="'nested quotes'" g = 'gaps' h="multiple spaces"
The attribute name will be in match position 0, while the value will be in either position 4 or 5
depending on whether or not the value is quoted.
For unquoted values (such as attribute d above) match position 4 will be blank and the value will be
in position 5. Otherwise, the value will be in position 3. This could be normalized with some
additional work but would make the expression complicated for my needs.
(\w*) *= *((['"])?((\\\3|[^\3])*?)\3|(\w+))
@chiemekailo
Copy link

Thought you would have shown example result. Like assume “a” - “h” are all in one tag.

@oodavid
Copy link

oodavid commented Nov 15, 2023

Here's the regex at play:

https://www.regextester.com/?fam=132501

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment