I put part3 before part2, because 3 simplifes the code
(Table of contents links don't work correctly in a gist)
- regex interactive tests:
- The magic
- Note: Lazy regex is your friend
- Sample data
- Part1 : parse every line
- Regex2 : trim whitespace
- Iter3 : Let the lazy regex trim for us
- Iter2: Add trim outside of the regex + errors
Viewing it on Regex101 adds colors, and so much for readability and tooltips for explainations dynamically
- main regex, for part 1 and 2 https://regex101.com/r/JQTHeL/1
- regex using
lazy
, doing whitespace trimming for us https://regex101.com/r/nqtNHg/1
comes down to this part: creating a [pscustomobject]
from the hashtable $matches
$Text -match $SomeRegex
$matches.remove(0)
[pscustomobject]$matches
- Part 3 uses the lazy modifier, forcing the greedy part to collect all the whitespacec
- the actual groups didn't change, just add lazy, and add
\s*
Pattern | Description |
---|---|
.* |
means greedy anything 0-to-many times |
.*? |
means lazy anything 0-to-many times |
First, I pasted sample data from the regex test using a herestring
$sampleLines = @'
sample:
Name is object
Color is red
Location = c:/foo/bar
'@ -split '\n'
Then pasted my regex using (x?)
to enable verbose mode
- lets you split your regex into multiple lines formatting with whitespace: tab, newline, spaces, anything
- supports comments using
#
- some places you have to explicitly add whitespace using
\s*
or\s+
, because it's not a literal anymore. Everythign else is the same.
$RegexPair = @'
(?x)
^
(?<Name>.*)
is
(?<Value>.*)
$
'@
function parseLines1 {
# simplest auto-object creation (ie: before fancy pants)
$pairs = $sampleLines | ForEach-Object {
$Line = $_
if ($line -match $RegexPair) {
$matches.remove(0)
[pscustomobject]$matches
return # note: return here is used as control flow
}
Write-Verbose "Match failed Line = '$_'"
}
$pairs
}
Pwsh> parseLines1 | Format-Table
Name Value
---- -----
Name object
Color red
$regexPair2 = @'
(?xsm-i)
^
\s*
# make the group lazy, so the greedy \s
# will capture all the whitespace for us
# using the ? operator after a quantifier makes it lazy
# .* means greedy anything 0-to-many times
# .*? means lazy anything 0-to-many times
(?<Name>
.*?
)
\s*
is
\s*
(?<Value>
.*?
)
\s*
$
'@
function parseLines3 {
# clean up whitespace, done in Pwsh instead of regex
[CmdletBinding()] #allows -ea Ignore
param()
$pairs = $sampleLines | ForEach-Object {
$Line = $_
if ($Line -match $RegexPair2) {
$matches.remove(0)
[pscustomobject]$matches
# note: return here is used as control flow
return
}
# failed matches continue here
Write-Error "Match failed Line = '$_'"
}
$pairs
}
Pwsh> parseLines3 -ea ignore | Ft -Wrap
Name Value
---- -----
Name object
Color red
function parseLines2 {
$pairs = $sampleLines | ForEach-Object {
$Line = $_
if ($Line -match $RegexPair) {
$matches.remove(0)
$yourMatch = $matches
$yourMatch.Name = $yourMatch.Name.Trim()
$yourMatch.Value = $yourMatch.Value.Trim()
[pscustomobject]$yourMatch
return
}
Write-Verbose "Match failed Line = '$_'"
}
$pairs
}