Last active
December 14, 2015 18:09
-
-
Save drgomesp/5126867 to your computer and use it in GitHub Desktop.
XML Shallow Parsing using Greppy API
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
// [^<]+ | |
$textScanningExpression = | |
p()->not('<')->more; | |
// [^-]*- | |
$untilHyphen = | |
p()->not('-')->until->literal('-'); | |
// $untilHyphen(?:[^-]$untilHyphen)*- | |
$untilTwoHyphens = | |
$untilHyphen . | |
p()->silent( | |
p()->not('-') . | |
$untilHyphen | |
) . p()->until->literal('-'); | |
// $untilTwoHyphens>? | |
$commentCompletionExpression = | |
$untilTwoHyphens . | |
p()->literal('>')->optional; | |
// [^\]]*]([^\]]+])*]+ | |
$rightSquareBrackets = | |
p()->not(']')->until->capture( | |
p()->not(']')->literal(']')->more | |
) . p()->any->literal(']')->more; |
PS: Don't give up on p()
! We could have $p()
with pure PSR-0 compatibility, or Greppy::autodefine('function_name')
based on the autodefine RFC https://wiki.php.net/rfc/autodefine
Thanks for the contribution.
The gist is updated.
About the p()
function, yes, I do want to provide that.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Got it. It's mixed up. "^" means line start outside sets, inside [sets] as the first char it means a negated set.
A set can only contain literals and ranges. Perhaps
->set('a', 'b', '0-9')
would be short while still meaningful, and we can remove thecloseSet
instruction. Groups could be the same. Every group has another expression inside, so we could have something like...->capture(p()->set( '/');
Negated sets could simply be namednot
.The last regex written as suggested: