Skip to content

Instantly share code, notes, and snippets.

@ninmonkey
Last active September 27, 2022 01:19
Show Gist options
  • Save ninmonkey/dee234dfb5a93799b4a36b97b018bdb6 to your computer and use it in GitHub Desktop.
Save ninmonkey/dee234dfb5a93799b4a36b97b018bdb6 to your computer and use it in GitHub Desktop.
Automatic Objects from Regex Groups.md

I put part3 before part2, because 3 simplifes the code

(Table of contents links don't work correctly in a gist)

regex interactive tests:

Viewing it on Regex101 adds colors, and so much for readability and tooltips for explainations dynamically

The magic

comes down to this part: creating a [pscustomobject] from the hashtable $matches

$Text -match $SomeRegex
$matches.remove(0)
[pscustomobject]$matches

Note: Lazy regex is your friend

  • Part 3 uses the lazy modifier, forcing the greedy part to collect all the whitespacec
  • the actual groups didn't change, just add lazy, and add \s*
Pattern Description
.* means greedy
anything 0-to-many times
.*? means lazy
anything 0-to-many times

Sample data

First, I pasted sample data from the regex test using a herestring

$sampleLines = @'
sample:
    Name is object

Color is red
Location = c:/foo/bar
'@ -split '\n'

Then pasted my regex using (x?) to enable verbose mode

  • lets you split your regex into multiple lines formatting with whitespace: tab, newline, spaces, anything
  • supports comments using #
  • some places you have to explicitly add whitespace using \s* or \s+ , because it's not a literal anymore. Everythign else is the same.
$RegexPair = @'
(?x)
    ^
    (?<Name>.*)
    
    is

    (?<Value>.*)
    $
'@

Part1 : parse every line

function parseLines1 {
  # simplest auto-object creation (ie: before fancy pants)
  $pairs = $sampleLines | ForEach-Object {
      $Line = $_
      if ($line -match $RegexPair) {
          $matches.remove(0)
          [pscustomobject]$matches
          return # note: return here is used as control flow
      }
      Write-Verbose "Match failed Line = '$_'"
  }
  $pairs
}
Pwsh> parseLines1 | Format-Table

Name      Value
----      -----
    Name   object
Color      red

Regex2 : trim whitespace

$regexPair2 = @'
(?xsm-i)
    ^
    \s*
    # make the group lazy, so the greedy \s
    # will capture all the whitespace for us
    # using the ? operator after a quantifier makes it lazy
    # .*   means greedy anything 0-to-many times
    # .*?  means lazy   anything 0-to-many times
    (?<Name>    
        .*? 
    )   

    \s*   
    is
    \s*
 
    (?<Value>
      .*?
    )
    \s*
    $
'@

Iter3 : Let the lazy regex trim for us

function parseLines3 {
    # clean up whitespace, done in Pwsh instead of regex
    [CmdletBinding()] #allows -ea Ignore
    param()

    $pairs = $sampleLines | ForEach-Object {
        $Line = $_
        if ($Line -match $RegexPair2) {
            $matches.remove(0)
            [pscustomobject]$matches
            # note: return here is used as control flow
            return
        }
        # failed matches continue here
        Write-Error "Match failed Line = '$_'"
    }

    $pairs
     
}
Pwsh> parseLines3 -ea ignore | Ft -Wrap
Name  Value
----  -----
Name  object
Color red

Iter2: Add trim outside of the regex + errors

function parseLines2 {
  $pairs = $sampleLines | ForEach-Object {
      $Line = $_
      if ($Line -match $RegexPair) {
          $matches.remove(0)
          $yourMatch = $matches

          $yourMatch.Name = $yourMatch.Name.Trim()
          $yourMatch.Value = $yourMatch.Value.Trim()
          [pscustomobject]$yourMatch
          return
      }

      Write-Verbose "Match failed Line = '$_'"
  }
  $pairs
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment