Skip to content

Instantly share code, notes, and snippets.

@Whateverable
Created January 19, 2018 07:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Whateverable/b6a2a591652f4e63b292b95acd7b0b4d to your computer and use it in GitHub Desktop.
Save Whateverable/b6a2a591652f4e63b292b95acd7b0b4d to your computer and use it in GitHub Desktop.
quotable6
«regexp»

¦ doc: �Merge pull request #1706 from fluca1978/fix-comment-regexp�
¦ doc: It seems the comment working on regexp was referencing the similar feature of
¦ doc: �Merge pull request #1704 from fluca1978/fix-comment-regexp�
how is the regexp :ignoremark semantic actually defined?
none of the regexp engines i've tried has got that one right, and only Ruby (oniguruma) has got the previous examples right too
okay, even ICU's regexp engine does not bother to try to match (?i:(XS|Y)S) against xß 'correctly'
comborico1611: it isn't, it's from the fact that in ed, to print out all matches from a file for a given regexp, you type :g/re/p
comborico1611: 'g' being global, 'p' being print, and 're' being a placeholder for whatever regexp you use
Hi. The code examples on the perl6 website (eg http://examples.perl6.org/categories/best-of-rosettacode/24-game.html) are buggy. All instances where a bare word is in angle bracket (<exp> in a regexp) does not show up in the webpage, since the html in browser eats it up.
is there any way to define alternation in regexp with backtracking ?
zengargoyle: I mean I can change mi6 to take care of regexp in MANIFEST.skip
my code that i made this extract from, does a few regexp matching, and string replacement on a string.
It's the most common regexp use, and I think giving a simple example is nice.
need a little help with regexp
I thought they were word boundary inside subst regexp
Is there a way to force a substitution regexp to ignore whitespace in the substituting part?
nicq20: I've always liked https://swtch.com/~rsc/regexp/regexp1.html for its clarity and simplicity. if you're curious for more details, I recommend it.
how to create a regexp expression from given string?
but this is well, "comipiled" regexp
I have an input sting, with a "capture" blocks '(' ')' and I want to compile a regexp upon this string and I want to access to captured data, and this one is a not named captures ...
Ok, Zoffix: thanks alot. I guess named capture for interpolated regexp string works for me:
how I can create a regexp from string? given a string $foo = '\d+' I want to get a regexp /\d/
Another question on regexp captures. How to get all captured data list ?
melezhik_: because that problem is better solved by not using that regexp
why do you, specifically, have to use a regexp with a capture
go-to reference: https://swtch.com/~rsc/regexp/regexp1.html
moritz: yes, I know how to do what I need in primitive way, using plain regexp, I wanted to have it via grammars
let's have a simple case, I want to express generators: expression and regexp: expression in a single perl6 grammar rules
I know how to do this for every type of entry (regexp: and generator:) but when I need to have may grammar to match any of them I fail
like <generator> | <regexp> does not work
so in order for that to work it has to find generator at the beginning and it has to reach the end. alternatively, regexp has to come at the beginning and go to the end
and other vise if regexp: entries comes first then grammar find them and stop ( ignoring generator entries )
it's confusing that the parts of the regexp are named "regexp" and something else :)
I have many like - generators, code, regexp, plain strings, asserts, text blocks ( you may take a look at the doc link I shared )
I could not find an proper explanation of ~ in perl6 regexp, but probably ( I might be wrong ) it is something for looking inside some sub area ?
you use the right matcher. If you are looking for a mix of numbers and chars use a regexp
Like I said the other day, the only more practical code I wrote is slowed down by IO.basename, and that one waits on regexp search/replace. I suspect that p6 web frameworks that use grammars for routing will probably wait on the parsers to finish
also, is there a p5 to p6 regexp converter?
timotimo: that lets you use a p5 regexp in p6, right?
https://swtch.com/~rsc/regexp/regexp1.html <-- Rob Pike reference
Hmm, is there a way to make a regexp from string at runtime?
I'd like each individual element of @keywords to be treated as literal, but the whole string (with ".*" separating each element) to be treated as a regexp and I'm not sure how to do it.
and, yeah, breaking this up into an array of tokens or just crawling along with a while regexp munching a token at a time is the answer
Your regexp isn't in that paste.
OK, so how to I specify a ";" in a regexp?
skids, You might be able to use the regexp directly to insert the dashes rather than having to call .join()
Can you refer to a previously matched pattern in the same regexp?
Could be the regexp? But why work at 500, and not 5k?
modules.perl6.org: fix regexp
Is there any way I can form a substitute regexp without having to quote angle brackets?
I'm going through regexp examples and that one confused me until I tried it here. Now I grok.
thanks, but what if $r uses regexp syntax?
atweiden, actually, you can probably find something useful in https://github.com/MadcapJake/language-perl6fe/blob/master/grammars/p6-regexp.cson
hmm it probably shouldn't even call out to |fmt .. there was a regexp to do that anyway
Zoffix: that would require the regexp to be recompiled
Is it possible to write a regexp in p6 that will match on lines that contain all of a set of substrings, but in any order along the line?
I can't do it in PCREs, wondering if p6 can do it now, as its regexp support is better.
chansen_: so I figured regexp-as-heuristic was a reasonable compromise
hankache: the regexp part is not whitespace sensitive
Could someone clarify why a Grammar returns a Match object with keys corresponding to tokens when a regexp match return a Match with one key (0) set to Nil? why is that key there?
not writing regexp, though, generating them rather
exept regexp
jnthn: but 'use v6' as a thing to regexp for seems useful for tooling to know which highlighter to use
aha, the shebang regexp has been tweaked?
'regexp'
ZoffixW: I think our vims are outputting slightly different markup and a better test would be a less matchy regexp
grep -P, --perl-regexp PATTERN is a Perl regular expression
lizmat: ignoremark, same as regexp
moritz: thanks! heredocs was just a nifty use of a really cool feature in node-oniguruma (js regexp engine)
"Type check failed in binding <anon>; expected Match but got Nil" has anyone seen an error like that in a simple regexp match?
what does a single | do now in a regexp .. I know the || is alternation?
Is there a nice way to get a list of matches of a regexp using a single statement? I'm having trouble with types:
TimToady jnthn : wrt to using a regexp as a line delimiter, we have that functionality in IO:Handle.split already, so should be pluggable into .get and .lines
the recent regexp signature change one
rakudo/nom: I would argue as correct in the first place - a bare regexp that matches a
3 Oct 2015 16:40Z <Ven> lichtkind: shouldn't that be a regexp? https://github.com/perl6/problem_solver_tutorial/blob/master/chapter_4/bn4.pl6#L25
.ask lichtkind shouldn't that be a regexp? https://github.com/perl6/problem_solver_tutorial/blob/master/chapter_4/bn4.pl6#L25
(the thread is about a bug in the Perl 5 regexp compiler’s attempt to parse patterns for a POSIX regexp construct solely to warn that it isn’t supported)
perl5 has a rule for regexp questions where we ask for input, expected output, and what code you have so far
is anyone interested in reviewing https://github.com/rakudo/rakudo/pull/440 ? particularly my attempt at a proper perl6 regexp rather than cheating with :P5 :)
https://swtch.com/~rsc/regexp/regexp1.html
22 May 2015 21:51Z <tony-o_> japhb: i have this dimming dalek and camelia pretty well - /hilight -regexp -line -color %K ^rakudo-.*?\s[\d\w]{6}:
22 May 2015 21:54Z <tony-o_> japhb: i have this dimming dalek and camelia pretty well - /hilight -regexp -line -color %K ^(m|r|p):
japhb: /hilight -regexp -line -color %K ^rakudo-.*?\s[\d\w]{6}:
.tell japhb i have this dimming dalek and camelia pretty well - /hilight -regexp -line -color %K ^rakudo-.*?\s[\d\w]{6}:
.tell japhb i have this dimming dalek and camelia pretty well - /hilight -regexp -line -color %K ^(m|r|p):
I guess it's the ".setting" regexp?
I didn't get "<message string> doesn't match /regexp thing/" but some totally weird other thing that made it unclear that I'd even got the syntax of the test right, let alone the semantics.
timotimo: I find that sentence to have the perfect balance: "TimToady improved [...] for a regexp match."
I think I am going to work on quotes next, like regexp and heredocs
ya, its not recongizing any of the regexp language either yet
[Coke]: also are you using vim 7.4? I opened a file in vim 7.3 and it barfed on some of the perl6 syntax regexp so I think it must require vim 7.4+
http://swtch.com/~rsc/regexp/regexp1.html
moritz: however, still hangs on regexp.pod
it feels like a relatively straightforward regexp problem
what if you *wanted* them to expand to regexp metachars? ;-)
Can I get the original regexp from $r?
yes, I want to define the regexp in the longwinded way and then get an oneliner from it
m: my $regexp = / ^ '[' $<sub>=<-[ ']' ]> ']' /;
rakudo-moar 3bbf7b: OUTPUT«�[31m===�[0mSORRY!�[31m===�[0m Error while compiling /tmp/LOw9BNkAa4�Unable to parse expression in metachar:sym<assert>; couldn't find final '>' �at /tmp/LOw9BNkAa4:1�------> �[32mmy $regexp = / ^ '[' $<sub>=<-[ ']�[33m��[31m' ]> ']' /;�[0m� e…»
reduce the amount of exact match, and figure out what text is the minimum, do a regexp?
TimToady: hee: so, with slurp and this regexp, 19s:
with your regexp, just over 1s:
(in the regexp, that is.)
perl6-debug seems to get lost in the myriad of regexp and whatnot
I think the grammars and regexp stuff would warrant by itself another tutorial
ingy: hahaha just realized your regexp couldnt fail :(
Can anyone explain to me what <foo('bar')> is supposed to do in a regexp? I mean, it calls a function/method, but then what? How do I usefully use that?
ingy: re := regexp.MustCompile("^([^aeiou]*)(.*)$")
I mean @<foo> inside of a regexp
regexp.h: regnode program[1]; /* Unwarranted chumminess with compiler. */
timotimo: re pessimal regex, were you talking about the one in http://swtch.com/~rsc/regexp/regexp1.html or something else?
(regexp-modifier-like syntax)
I'm going to parse a file that'll contain directives (like `= foo`), and keep everything else intact, and I wonder if I should match the directive regexp and replace in the string, or if I should make a grammar and keep bits I didn't parse
wasn't sure, that's why. What'd it take to get the regexp in "non-greedy" mode ? (where it'd match ** 1 everytime) ?
Apologies for the perl FIVE regexp ;-).
vendethiel: the regexp is essentially an .index operation, looking for a substring.
Rant #2: Identifiers. This is maybe a little early in development to work on this, and you'll probably tell me a lot of it is waiting on developments in the regexp engine, but it's worth looking at anyway.
still not sure about `||` vs `|` (in regexp context, not `,` vs `;` interp), isn't it just for precedence ?
not that there's anything wrong with traditional regexp matches :)
Heh, sometimes I used [option1|option2] in P5 regexp by accident.
the new vim release features a faster regexp engine. i wonder if the perl6 syntax rules can make use of that?
FROGGS: actually from http://swtch.com/~rsc/regexp/regexp2.html
throws in http://swtch.com/~rsc/regexp/regexp1.html as a reference
masak: did you test this regex wtih http://swtch.com/~rsc/regexp/regexp1.html ?
masak: http://perlgeek.de/blog-en/perl-tips/in-search-of-an-exponetial-regexp.html :-)
Util++ wow thanks, I'll makesure I get on with it ;) I dont even recognise those perl5 regexp modules
mathw: also, Perl 6 people consistently say "regex", whereas many Perl 5 people say "regexp".
just tried the motivating example with the horrible graph in http://swtch.com/~rsc/regexp/regexp1.html
arnsholt: http://swtch.com/~rsc/regexp/regexp1.html has some sweet diagrams and nice, clear code.
i'm slightly surprised the mu repo refers to the article http://swtch.com/~rsc/regexp/regexp1.html "Regular Expression Matching Can Be Simple And Fast", yet in rakudo the pathological regex is still pathological
looking forward to the new vim version with the alleged new and much faster regexp engine for highlighting and such
jnthn: the outer is supposed to be set once when a sub/method/regexp is defined and never again
jeff_s1: regexp-pcre doesn't seem to be listed in the pugs repo. how does it get pulled in?
as opposed to if you had a single regexp.
lizmat: only regexp matching gives more information for successes, AFAIK.
hmm long ago when IRC was young there was regexp creation race channel
I guess mad regexp skills are so pre y2k ...
r: say ~/regexp/
rn: my $regexp = /abc/; 'dabce' ~~ / d $regexp e / # I know, you should use <$regexp>
rn: my $regexp = /abc/; say 'dabce' ~~ / d $regexp e / ?? True !! False # I know, you should use <$regexp>
rn: my $regexp = /abc/; say 'dabce' ~~ / d <$regexp> e / ?? True !! False # I know, you should use <$regexp>
GlitchMr: so... why don't you use <$regexp>...?
I’m having trouble trying to interpolate a variable to replace the hard-coded 3 in the regexp
use eval to create the regexp :D
I suspect you could mine some gold in the regexp space to get that higher.
i mean... i have a regexp like: s/\x02|\x09|\x13|\x0f|\x15|\x1f|\x16|\x12|\x03(?:\d{1,2}(?:,\d{1,2})?)?//
diakopter to check if string contains something I need regexp?
Q: Is there a simple way to turn a string into a regexp object? Or is there even a need for that, or I can just use /$mystring/ as if it were a regexp?
hm, it seems I can use a $var directly without casting it into a regexp... only wondering if that has any performance drawbacks
Does anyone have a quick idea of the P6 equivalent of the regexp ['"] (in P5) ?
diakopter: http://swtch.com/~rsc/regexp/regexp1.html - Regular Expression Matching Can Be Simple And Fast
FROGGS: if you haven't come upon http://swtch.com/~rsc/regexp/regexp1.html, I can heartily recommend it. it's enlightening.
is above "m/{ say &?ROUTINE.WHAT } a/" a bug? previously &?ROUTINE was available in regexp blocks and returning Nil
and the regexp matching operator: //
Just make the regexp greedy then...
Abcxyz: "Just make the regexp greedy" is more like the problem, not the solution.
The regexp is a big surprise to me, since before I was using C++ to treat streams...
If you are going to tell me that grep can use --perl-regexp option, this is GNU option, not UNIX.
GNU grep depends on PCRE in only one specific case - when using --perl-regexp option.
Plan9 probably forked 'regexp' meaning or at least is trying to make step forward from *nix world... http://swtch.com/~rsc/regexp/regexp1.html but mayve this link is so 2007's ?
Woodi_: I don't understand -- are you saying http://swtch.com/~rsc/regexp/regexp1.html is related to Plan9 somehow?
std: use v5; print/regexp/;
std a8bc48f: OUTPUT«�[31m===�[0mSORRY!�[31m===�[0m�Bogus term at /tmp/d866jMaCoj line 1:�------> �[32muse v5; print/regexp/�[33m��[31m;�[0m� expecting any of:� prefix or term� whitespace�Parse failed�FAILED 00:01 52m�»
sorear: does Niecza support :bytes in regexp?
but that's not :bytes *in* regexp :D
I'm debugging bugs in my nqp-js regexp implementation, and I thought it might be usefull to hook that to a regexp debugger at some point
diakopter: I'm thinking of using \d in a javascript regexp
I'm convinced now that the only sane way to do an emacs major mode for P6 is for the major mode to interface with a P6 script that does the real work (getting an emacs regexp to handle multiline comments is near-insane)
How do you specify a regexp like /(pattern)*/ in an NQP token?
Should I make "explain regexp" tool?
I originally wanted to name it "explain-regexp" but word games are nice ;-).
This is regexp to match literal characters
Let me guess, it's attempt at copying grep /regexp/, @array; syntax from Perl?
Why regexp?
Just wondering, I have Perl 6 regexp - how can I parse it?
is there a syntax to create a token within the regexp ? like perl5 named capture ?
i'll talk about perl6 regexp during my pycon talk :)
ok: how do I match "not char x" in a regexp?
Like - if (something) /regexp/.method()
sorear: ERROR: syntax error at (re_eval 21) line 2, near ""foo" @ ;" Compilation failed in regexp at (eval 20) line 1.
yeah... I did p5 regexp for so long there's a lot of inertia
. o O ( java.util.org.cpan.regexp.common )
I would like to create P5 regexp to P6 thingy, but I wonder how to convert $ (without /m) or \Z
I want to make P5 regexp to P6 regexp converter
moritz: http://perlgeek.de/blog-en/perl-tips/in-search-of-an-exponetial-regexp.writeback
there is some funky syntax to turn a string into a regexp object (and as such a subrule) on the fly
Shouldn't interpolating @array in regexp make alternations dynamically?
is there any reason why you stick that stuff into a regexp?
Just wondering, if my array would look like (/./, /../) would it match one or two charcters when inserted to regexp?
how should "\ " (escaped space) behave in regexp? match literally?
TimToady: 02 Jul 16:46Z <bbkr> tell TimToady that / $. / regexp parsing using STD has some internal error "Use of uninitialized value $x in concatenation in line 66623" leaking and shown before "Unsupported use of $. variable" message (meant to be seen by user). RT 77550 shows that it did not happen in the past, so this is an regression in awesomness :)
phenny: tell TimToady that / $. / regexp parsing using STD has some internal error "Use of uninitialized value $x in concatenation in line 66623" leaking and shown before "Unsupported use of $. variable" message (meant to be seen by user). RT 77550 shows that it did not happen in the past, so this is an regression in awesomness :)
* New regexp routines derived from Henry Spencer's.
i don't get it. "abc" is matched against regexp that returns a list and is matched again against this list?
Have You guys read http://swtch.com/~rsc/regexp/regexp1.html ?
does not understand this "but" magic in this regexp
moritz: how efficient is that, will it read all contents first and then apply the regexp or is it smarter then that?
ValueError: Broken regexp: '*FACE*' (file "/home/sbp/Dropbox/inamidst.com/phenny/modules/codepoints.py", line 58, in codepoint_extended)
http://perlgeek.de/blog-en/perl-tips/in-search-of-an-exponetial-regexp.html
Slightly OT: Is rakudo's <{fail}> in regexp formally considered a cheat? It does what I consider to be the right thing and propagates something that explodes down to the root of the returned result, but that's despite the fact that it is tested for truthfulness during the regexp match, which should mark it as "handled" (?)
How does one fail a regexp from inside a closure without failing the interpreter?
Is it possible to provide decimal char range in regexp? for example I can write in hex <[\x20]>, but <[\d32]> is interpreted as digit,3,2.
more readable version: http://swtch.com/~rsc/regexp/regexp1.html
jnthn++ for fixing regexp make and $/ in block. That was a blocker for my dabblings.
moritz: Use of uninitialized value $1 in regexp compilation at (eval 20) line 1. fo
hmh. everything unicode by default and regexp.. instantly brings to my mind locales :(
nqp: Merge pull request #23 from ruz/regexp-fixes
when you submit that, you can golf it down to just the 2nd regexp
even if you use them where you'd use a regexp
knowing that a regex is a sub under the covers, I'm not sure how you'd get the original regexp back out.
:p5 regexp modifier is NYI?
maybe the "subset Person of Hash" test could be made more robust by doing a regexp on the Exception in the dies_ok case?
Hmm,, the ones I saw pretty much looked the same as the usual extended regexp I used to use in Perl 5.
rakudo: / a ** / / / # parse bug here, should detect empty regexp first like STD does?
rakudo: say so "" ~~ m/<[a..z]-[x]>/ # known bug? seems that exclude "-[x]" in char class fails when regexp containing such class is matched on empty string
rakudo: say so "a" ~~ /<[\x00..\xFF]>/ # what is the proper way to give hex ranges in regexp? this doesn't seem to work
moritz: time for workarounds then. luckily this works perfectly: m/^<[!..~]-[:]>+$/ (RFC2822 header name regexp for incoming Email::MIME)
jnthn: thanks. I wanted to avoid split/comb, but since \xHH is not yet implemented in regexp classes looks like I have no choice :)
is there in P6 any smart equivalent of p5 code "defined $x and length $x" (except regexp match)?
if you use "use charnames ':full'" and \N{...} in a regexp
Anyway, next stop: regexp :)
I never required that "use strict" in my small test/practice scripts that I often create to try something. e.g. I want to try some class function or test regexp etc. I create a small script for side work.
try some logic, or regexp or anything
moritz_ regexp ... cool
hmm, in which part of S05 is this regexp tilde trick described?
slavik: that you can write regexp like / [ '<h1>' ~ '</h1>' <header_text> / instead of / [ '<h1>' <header_text> '</h1>' /
std: \/ a / # bug? i'm not sure if escaping is allowed in bare code, like in this case for regexp beginning
"Rakudo allows postfix<:> in regexp(?) but STD doesnt"
i read somewhere that it's regexp is too diff.
rakudo: say "mmmmmmmmmany" ~~ /m ** 1..0/ # a bug? inverted ranges (correct according to STD) matches infinite amount in regexp repetition ** op
bbkr: I don't think the limitations will be hidden if everywhere we talk about ranges in regexp, we use normal ranges as an analogue rather than begin the same.
i'm writed oldscool regexp parser :)
lol rewrite in this case to v6 mine regexp be crazy "m/\<(.+?)(\s.*?)?\>(.+?)\<\/\1\>/sig"
reokace regexp :)
:-D i'm catch fine bug with recursion, but it already solved. If use in while vars getted from regexp it drops in recursion for eternal :)
tylercurtis :) and with regexp bit trobules :) in v5 ? means non greedy, rakudo just ignore it :)
Is there any way to restrict class attribute with regexp? something like class Address { has $.zip where /^\d\d-\d\d\d$/ } ?
fwiw, the program I used at http://diotalevi.isa-geek.net/~josh/100720/timing-rakudo2.txt really spends pretty much "all" its time in the rakudo regexp engine and while I ahven't traced it down, some inputs were phenomenally slower to execute against.
so far I'll have to regexp-extract it?
so is it true that rakudo has no access to a regexp engine?
which is a stretch to call regexp
but I'm just asking if rakudo can invoke a "normal" regexp engine
that must be why I wrote \: in the original regexp
The one 'TODO' I didn't try undoing was I'd originally tried a regexp back reference which didn't work. I tried all the others though.
tylercurtis: I'm written a parser grammar system built on regexp that is curiously close to p6rules
so my other stuff takes little regexp bits and combinates them into bigger rules
http://github.com/ingydotnet/cdent/blob/master/src/grammar/atoms.yaml is the regexp atoms for cdent
in theory, the whole grammar could be combinated into one big (and fast!) regexp
ingy: are you familiar with this page? http://swtch.com/~rsc/regexp/regexp1.html
needs regexp hepl.
zomfg, regexp::grammars is awesome
how close is regexp::grammars to perl6 ?
I even seen that metadata works better with regexp or matches lately, or am I dreaming already? It is late here.
diakopter: sounds a lot like http://swtch.com/~rsc/regexp/regexp1.html
most notable acomplishment there so far is managing to get macports 5.8.9's regexp engine to dump core by using (?{}) to simulate some p6-isms
isBEKaml: I think the regexp tests are already setup like that. (at least the parrot ones are.)
isn't the " " split token also just shorthand for a special regexp
well "xy" is also a regexp right?
I'm pretty sure http://gist.github.com/393761 worked in an older version of p6. Any ideas why it now claims that a regexp is not closed with an angle bracket? And the version in the comment doesn't work either?
although test can be corrected, because now it says '# TODO: Check for a good error message, not just the absence of a bad one'. can regexp for rakudo error message be placed in PUGS repo?
jnthn: cool one :) i'm thinking about some smart regexp using lookahead
sorear, like using the result of a regexp as the arguments to the rountine
moritz_++ # http://perlgeek.de/blog-en/perl-tips/in-search-of-an-exponetial-regexp.html
wonders if there is some regexp wizard here
I don't know if the parser would like it, but r:op has pleasant connotations with regexp modifiers
look, a new one! http://swtch.com/~rsc/regexp/regexp3.html
solving sudoku with regexp
i recall reading some medical study that linked that with chronic usage of regexp
cool. I read http://swtch.com/~rsc/regexp/regexp1.html today for lunch.
what is the rationale behind using ^^ and $$ instead of ^ and $ in regexp?
masak: my $regexp = RegexContainer.new( Concat.new( Quant.new( 0, Inf, GREEDY, Quant.new( 1, Inf, FRUGAL, Literal.new('a') ) ), Literal.new('a'), Literal.new('b') ) ); # input: aaaaaaaaaaaaaaa
but for just a small subset of regexp node types.
If the thread on reddit or whereveritwas hadn't been so Slashdot-at-its-worst, I might have commented that regexp syntax in the languages with PCRE is _not_ the same as in Perl. And for the PHP-friendlies, that PHP messed up the conditional ternary operator, and really has a screwed up reputation for security vulnerabilities, to the extent that ... oh. I'm ranting again, am I not?
moritz_: I mean, not anything more than string regexp's
moritz_: I'm not sure what theoretical foundation you could get for numerical regexp's that goes beyond character regexp's
and then use a regexp to separate them
why put second in regexp ?
It's got a kick-ass awesome regexp language to go with it
sorry asking a perl 5 question here, I am using Regexp::Grammars that has problems with lookahead. As a workourand I want to use a zero-wdth assertion. But I don't how to access the string on which the match opers within the regexp
renormalist: hey, do you know why cperl-mode.el in util/ doesn't workforme on emacs 23? I get 'File mode specification error: (void-function compilation-build-compilation-error-regexp-alist)'
r28009 | renormalist++ | cperl-mode.el: cherry picked compilation-error-regexp-alist fix from 5.24
r27691 | moritz++ | [t/spec] remove old regexp <sp>, colomon++
gimme5 uses a hackish, regexp-based parser to translate 6 to 5
Matt-W: you can definitely create a sub which returns a regexp based on a parameter.
In http://gist.github.com/141232, I'm getting a null PMC exception in get_string() while running a regexp. What should I be looking at to debug this?
they might. PGE (the regexp engine) might not handle it well though.
asking --target=pir is highly uninformative for regexp contents.
because it's the compiled code for the regexp
wait, are you expecting to see the regexp source in the --target=pir output?
can more reasonably find the regexp components by using something more striking like 'X'.
It is ... strange to read my regexp turned into "assembly."
p5 regexp, or ... ?
I am however, interested in why C<< regex git_repo { '/.git' }; '...' ~~ git_repo >> throws an arity error. S06 goes to effort to hint that regexps are really just functions while ~~ ought to notice I've passed a regexp. Instead, I'm told I passed 0 args while 2 were expected.
pmurias: using the current regexp replacement im using in quick fixes wont help us in this situation; we could guess and add a ; after }
masak: one thing i didnt perfect; is the regexp for a perl 6 identifier
parsers are another good use, since the regexp stuff works OK.
Since +"xx" == +"0" == 0, how should I find the difference .. I'm trying to validate user input without using my own regexp for that.
If I use anything with regexp inside action method
maybe a threaded jit javascript with regexp vm instructions might match it (webkit), but otherwise.
so for now, its back to the "what is the goal" thing. If you feel the need to write non-small p6 now, backends give you a way to deal with "I really need to use a p6 regex engine faster than p5's regexp engine". or whatever. and long term, you are going to want the though that went into the assorted runtimes anyway.
that'll do nicely for a regexp engine
has even integrated the regexp vm ops into the full vm ops. so get jit and all.
..elf 25485: OUTPUT«Use of uninitialized value $_[1] in regexp compilation at ./elf_h line 121.␤Use of uninitialized value $sep in join or string at ./elf_h line 187.␤cba␤»
because evaluating the regexp really doesn't matter, as that's not where the match would happen
r24775 | putter++ | [STD.pm] Kludges to work around some p5 regexp parse failures.
moritz_: Hmmm. It seems the t/regex/ p5 regexp tests have been deleted? What's up? My fuzzy impression is that rx:P5/foo/ has been respeced as rx/:P5 foo/, but other than that, those should be valid tests. A thousand of them.
literal: convert perl6.vim to perl regexp
pmurias: re the problem, err, there are a great many things which aren't good enough to base a running STD.pm on. STD.pm is very not-small and non-trival. so the task is to identify subsets of it which can be developed and tested well. core IR and runtime regex/regexp is one such.
moritz_: given that perl6.vim is all regexp, it can be processed by a perl5 library to produce highlighting...
pmurias: no i mean intepreting vim regexp...
pmurias: vim regexp interpreter
moritz_: you mean something like to compile perl6.vim -> regexp and then run the perl5 code
perl5 regexp's (for a while it could only parse p6 regex)
things im going to work on: regexp highlighting
r21893 | fglock++ | [misc/pX/fglock/nfa-perl.pl] a perl5 implementation of http://swtch.com/~rsc/regexp/regexp1.html
:) when doing a "mutate the implementation while intending to maintain behavior invariance" exercise, yes. it means I don't really understand the changes I just made. perhaps with excellent test coverage, one might disregard understanding, and simply use suite passing. but a good test suite for regex/regexp regrettably doesn't exist.
pmichaud: last time I looked (long ago), my fuzzy recollection it was largely derived from the same p5 regexp.t? which is very partial.
spinclad: re scheme, on my todo list after 'full bootstrap with STD_red translated into p6 and absorbed into elf, and elf running on at least two backends', is to start pushing on frontends. First a js, but then a scheme. Thinking of bigloo, just because it has a regexp-like reader design.
maybe the regexp engine?
re "explicitly targeting other backends", indeed. which makes it harder to draw such "what is the language" lines. targetting multiple backends makes the choice clearer. eg, perl compatible regexp engines are out (everyone has them), p6 regex engines are in.
re see Prolog, there is *also* a combined p6/p5 regex/regexp engine written in prolog. misc/Grammars/snapshot_of_prolog_engine.pl . but that's unrelated abandonware.
think of all the thousands of regexp tests. their task is to catch the "oops, thought we handled that specific case correctly but didn't".
re Henry's regexp, lol
r20320 | putter++ | [elf_e] expand_backtrack_macros() now works. Tweaked p5 prelude regexp helpers.
r20293 | putter++ | The regexp and regex parsers are temporarily retained, as some of the action logic will need to end up in IRx1_FromAST or elsewhere.
grabbing bits from http://svn.pugscode.org/pugs/misc/winter_jig/STD/STD_extract (careful parse of STD), plus http://svn.pugscode.org/pugs/src/perl6/cheat and http://svn.pugscode.org/pugs/src/perl6/gimme5 (for the regexp pattern parse tree). plus some oo classes to represent the assorted rules, and answer questions about them.
are you part of a tree? gimme your tree to a p5 regexp. list your subrules. do you have any code? do any of your subrules take arguments? do you use any of the environment variables? which? etc.
if rule() is intended as a simple collection of regexp() modifiers, intended to match some user expectation or use case, then...
"regexp" is an old name, don't use it
r19961 | gwern++ | Remove a regexp from util/build_pugs.pl because we can't have backwards compatbility with that old Cabals if we want 6.8.x as well, so it merely messes up the generated Cabal file.
re buried alternations... yeah, while the regexp engine was a pain, redsix's "I'm just going to do full backtracking, and worry about ratchet stuff being required for a correct parse later" was somewhat more restful than the current "I think I can get by with fudging the backtracking like so... but I have a really really bad trackrecord of making such calls correctly".
Ie, get a regexp engine, and then fight to make the code work.
Specifically, transliterate STD to ruby or p5 _code_, built on a simple scanner (StringScanner in ruby), and using exceptions to backtrack. Only use the native regexp engine, and that only for low level lexing.
so far, Javascript is pretty slow on regexp and has about 10-15 secs of browser-timed execution time
im using syntaxhiglighter regexp...
javascript regexp...
keywords are matched like regexp's so p6 has lots of them... thus it is slower than ruby/python...
is there a way to do tentative regexp matching in p6?
prior to 5.mumble, if an regexp has a code block which tries to use a regexp, that second regexp doesn't work.
it doesn't matter what the regexp's were, simply that call stack hits two of them.
rephrased, for clarity, prior to the p5 regexp being modified to be rentrant, if your call stack has two regexp calls, the second one isn't working. Hmm, I don't recall at the moment whether the first one gets smashed too. I think not, but I'm fuzzy.
re 272, that's just inlining a regexp into a regexp. that works, though obviously if the regex is self recursive, that has problems. but if instead of eval(), that said eval() if "a" =~ /a/ (to use a trivial example of the code block _itself trying to use a regex_), then it wouldnt work.
a version of p5 which has recursive regexp support or not. if yes, life is simple, and you should be able to just inline code blocks, and have the right thing happen. But if not, if you want it to work on 5.8, then when you encounter a subrule or code block, you can't use a real regular expression anywhere above it in the regexp ast tree. Ie, given (b*<foo>)+
but the second can't be done unless you either have a regexp engine capable of reentrancy, or you can inline not just the pattern, but the surrounding code as well. (ie, p5 op codes, which then looks a lot like having a regexp engine capable of reentrancy;)
has anyone gone through and looked to see what p6 regexp concepts -don't- easily map to the new stuff? to create a verb shopping list? :)
re capture to array, yeah, something like that might be the gotcha. some subtle difference in how repetition or somesuch was handled, which forced one to bail out entirely. or at least for some class of regexp's, which still means paying the cost of doing a fullish implementation. :/
and even if it turns out one can't for some reason throw exceptions between blocks of a regexp, one could fake it. and you can "succeed" even if the actual regexp doesn't, so you can assign to the 'string being matched', even if p5 doesn't let you. and, at least on 5.10, you can do continuations down through subrule if you need to.
so a lot hinges on the regexp engine actually behaving correctly, in the face of this unusual exercise it's being given. historically that's been problematic. eh, but that's what 5.10.1 is for.
so the leverage comes from having control of the code that's embedded (by virtue of emitting from p6, rather than accepting arbitrary p5), and doing infrastructure with (?{}) blocks, rather than external code, which lets you use (??{}) for inclusion. and 5.10 is available for when a code block needs to itself use a regexp.
there was a php3? feel free to expand the regexp :)
if some time you would like to take a pass through the yet_another...whatever it was called, regexp engine, explaining/documenting and cleaning, just let me know. maybe we could find a collaborative editing website to work on the file simultaneously.
If you're working on regexp emitters I'd love to help with the p5 one
oh, and http://swtch.com/~rsc/regexp/regexp1.html :)
couldnt we do regexp on a string without a var?
but since we can do OO style VStr.op, shouldnt VStr.regexp be allowed?
could someone help me with a small regexp problem?
Other than the objects for the results and the regexp in there, it looks like it should be ok
But if you wanted to make the serialize/deserialize flatten those objects/regexp to struct equivalents, and then convert them back on deserialize, it would handle it
is <foobar> in p6 regex always a method call basically? Well, that or CODE that the regexp language executes itself..
dmq: I was thinking about perlapi functions to deal with regexp stuff
you will be happy or you will be used as regexp buffer?
i like much regexp too :)
as best I can tell, the current interface that re.pm uses makes rather a lot of assumptions that your regexp bytecode is the same as the core's
nwc10: re::engine is actually independent on the p5 regexp bytecode; it only depends on the regex API
its fairly straight forward. the regexp engine stores a hash that contains a dualvar that holds an array of u32.
ok, assuming that that is not going to be easy, would it be ok if i gave you back a copy of the internal hash that the regexp engine uses and you can decode it as necessary?
or alternatively you could design PCR to use a custom variant of the regexp engine that DOES track this type of stuff internally.
i just dont want to burden every regexp structure with data that will be used only by one module.
all regexp results variables are ties.
I have a trouble with multi-line regexp
geoffb: the regexp tests do take a while
one note to p6 implementors, putting match results in the regexp object is not a good plan.
TimToady: Do you recall when in perl history you brought in Henry Spencer's regexp code? It's for an evolving joke in re::engine::POSIX docs
What are some of the posix regexp implementations? glibc, ..?
After utterly wasting 1/2+ hour, putter is busy writing 1000 times one the blackboard "The Perl 5 regexp engine is not reentrant, and bizarre bugs occur when you forget that." :/
tired happy me. need a p6 regexp grammar. going to look at STD.
Anyone else read this regexp article? http://swtch.com/~rsc/regexp/regexp1.html
I have to trust that it's not just flattening the array into a string and then doing a regexp match against that ... :-)
what about regexp statement_control { ... } ? no simple longest token.
perhaps make regexp categ:foo {...} an optional way to specify a longname?
can one do regexp statement_control { foo+ ... }
Title: Perl regexp matching is slow??
Anyone know the status of bleadperl's attempt to make the regexp engine overridable?
and if they store capture data in the regexp struct, well, things go boom.
timtoady: dmq pointed me to http://swtch.com/~rsc/regexp/regexp1.html
re PITA, yes. most of my own regexp engine hacks have included an old (damian?) idea that the Match's in the match tree should remember what rule created them. completely non-spec I think. but it makes using the match tree so much easier...
I'd offer a p5-regexp-plus-rules, but it was never wrapped up for easy use. :/
re creating a new emitter for the regexp compiler, you might look at misc/pX/Common/Regexp-Engine-Reentrant/ , especially Backtrack.pm, or perhaps at misc/pX/Common/Regexp-Parser-ReentrantEngine/ .
it is the regexp compiler's task to recognize DFA, LL, LR, packrat, etc, etc, subparts of the requested regexp/grammar, and optimize them accordingly.
so, be the regexp compiler for a sec, and do foo
the pos f sees at call time must define a substring (pos at foo start to pos at f call) for which f would return a regexp which succeeds. (I'm dealing with the no-<null> case). can one do that without analysis of f?
"on pugs" is like "on mp6", but with pugs rather than mp6. if pugs match objects are odd, or you can't hook into pugs's regexp handling, well, that's not the objective. there just has to be some usable subdialect of pugs which is firm enough to build on.
the runtime behavior tends to be atrocious. perl and pcre note it as a pathological case; openbsd's standard regexp implementation can be easily made to take infinite time to match, and glibc's to take infinite space
perl6 calls the regexp match variable $/, just to be different. ($~ in ruby)
Almost no one used the regexp variables in perl 5: @- and @+.
can do regexp foo in C, that doesn't mean C is really appropriate for it
were I in C and wanting to use a regexp, I'd include a library for that like pcre, boost if I was lucky to be in C++ or "just" perl.
Its just that to implement a regexp engine you have to do c.
dmq - you mean to do a p5 plug-in regexp engine you have to do c?
its mostly a question of how the regexp engine / perl core api is written.
l~r it essentially comes down to this: a regexp engine has to be able to create a regexp_engine structure, and the complie callback defined therein has to be able to return a regexp structure. which means some perl/c glue interface that can be written by pretty much anybody, doesnt need me directly.
the association was regexp -> intrepreter -> engine instead of intrepreter -> regexp -> engine
=~/!~ in p5 is just plain old regexp matching.
what's the regexp for "as, provided 'method' is somewhere within 20 chars of it"?
r14533 | putter++ | Regexp-Parser-ReentrantEngine - created. A reentrant Perl 5 regexp engine.
r14533 | putter++ | This is an initial cleanup and module-ization of February's regexp engine spike.
fglock: any thoughts on creating a unified p5 regexp testing module? are we still doing the game of every p5 engine implementation grabs a copy of re_tests, writes their own driver for it, ignores most of the other p5 regex/ test files, etc? ;) ConvertToSix needs re_tests, but I'm trying to resist making yet another copy...
r14454 | putter++ | Common/Regexp-Engine-Reentrant - created. first step in repackaging the regexp engine spike as a module.
well, it's certainly the nature of perl5 (and other regexp engines) backtracking, which was the actual subject there
I thought a regexp may be more expensive, and I didn't see an Encode function
the regexp I came up with was: m/^ [0x00-0x7F]* $/x
r14188 | putter++ | Converts perl5 regexp patterns and literals to perl6. //,qr,m,s.
the regexp spike, whose syntax was p5 regexps plus <foo>, just converted the <foo args> into (?{dorule( C ,'foo','args')}), where C was the continuation representing the rest of the regexp. so backtracking worked properly.
am I correct in believing we still don't have a p5 regexp to p6 regex converter yet?
yipes! P5_to_P6_Translation isn't p5 regexp to p6 regex translation... P5_to_P6_Translation is p5 to p6 translation!!!
just now playing with perl5 embedded in pugs, so can eventually use new regexp toys
and pugs is currently using p5 for regexp (p6-flavor) support thanks to fglock++ et al.
since that perl expr only has a regexp, I gather the answer to "YES/NO's "loops" are created by quantifiers?" is yes?
na, there are lots of people who are interested in the regex stuff. if we were doing just p5 engine internals maybe. but this is more "what's nice regexp vocabulary" and how does these p5 regexp vocabulary correspond to p6 vocabulary
so there are two concepts. one is making a normally backtracking quantifier sometimes turn non-backtracking, depending on input. the other is jumping, from one point in the regexp to another, in this case to beyond quantifier or to "beginning".
putter: going back to your last regexp comment: that is basically the idea.
wouldn't the problem with it be that the regexp compiler would have to know of these :s and other flags for the quote ops?
each regexp structure now points at a vtable for its methods.
with respect to the regexp engine being cleaner, and more cleanly integrated into the rest of p5?
its pretty simple, each regexp struct holds a pointer to a shared vtable structure, when perl works with a regexp it dispatches through the vtable compiled in.
what is the state of the regexp analysis? has tries been integrated with something flexible?
loading a whole regexp library at the beginning of the regexp
its like a regexp switch/case statement.
I wondered if we could embed a DFA regexp (say, posix regexp) to a perl regexp
like, /foo(?d:bar)baz/ would match foo, then match the posix regexp bar atomically, then baz
I think regexp substitution is going to be a pain in the ass to write.
a regexp match returns a Rule?!? if so, that's startling.
before your work, the state was... what? individual leaf nodes like \d get their own regexp call (or implementation)? so the change is combining adjacent ones?
A wrapper may or may not be faster, depending on how complex the underlying regexp is.
s/shrank it/shrank it and ran the regexp again/
oh, before I forget, my one-liner on the p5 regexp engine performance is "C; some things are blazing *pc++; some good analysis (trie's); dumb - but otherwise the analysis/optimization is not very aggressive". so the ideal is to do our own aggressive optimization, and then farm out the low level to p5.
thinks it's finally time for him to sync with all this newfangled :) regexp stuff.
so, big picture. where do we stand on the old concept of rule/regexp unification with methods?
so what was rule foo {...} was simple renamed regexp foo {...}, and the current rule foo is simply regexp foo :blah :blah {...}?
so the old rule foo {} still exists, simply renamed to regexp foo {} ? yes?
so the old, pre-ratchet rule foo {} is now regexp foo :sigspace {} ? yes?
it's "regex" not "regexp"
TimToady: i just heard something from a friend who's recently immersed herself in Perl, and i think it will either amuse or disturb you: "did I mention that I had a weird dream last night? I dreamed that in the hands/voice of some person, perl regexp were The Word made manifest, and when spoken would change the world."
Eg, basically there is a regexp over identifiers which controls face. font, italics, etc. based on length, capitalization, underscores, etc.
(aside, as I collect nm's MO description, re "joining rule terms into p5 rx doesn't give much gain", the key is to avoid needlessly breaking up or taking over something which the p5 regexp engine handles wizzily. eg, scanning for start of match, sequence of quants, backtracking in general.)
subexpression... (|todo add this is a good time to address long-standing regexp vocabulary problems - create a reasonable standard, and the world with adopt it)...
oh, right. actually (bar) is ok too, you just need to postprocess it. that's what some of the early attempts at a p5 regexp engine did.
r13062 | renormalist++ | cperl-mode.el: handle regexp modifiers :Perl5 :perl5 :P5 more explicity
irssi help... why does '/last -regexp audreyt\>\s 5' not pull up anything? i have several days buffered and can see audreyt's last
cj: i can't make any backslashed item work. makes me wonder if -regexp is there more as a hope than as a feature
wolverian: it just doesn't like backslashes. "/last -regexp audreyt>.*utterly" works, "/last -regexp audreyt>.*\sutterly" doesn't
*regexp
That regexp involves much backtracking right?
luqui: can you do a wordy explanation of the regexp?
Programming in perl is like oracle programming, and the oracle is some regexp deity.
r11759 | clkao++ | * Make ~~ matching passed in (perl5-compiled) regexp work.
also, I email Ingy about the left shift thing. Right now parsing is a simple regexp. I suggested PPI and apparently there was some discussion between him and adamk about it and decided not to use PPI.
I did not realize that the regexp may not be just a rule
(and then we can remove the function call operator, as a function call is a smart match, and a regexp matching is a function call :)
one for inside a regexp and one for outside
nnunley: which unicode chars can you use for regexp quoting?
probably the best thing would be a regexp generate and test it in a QuickCheck style
The strawman lasted for all of a day, before the focus shifted to bootstrapping on p5. First as a p5 regexp and parser spike (p5 rule syntax, non-self hosting) which showed it could be done. And then another project which developed into PCR. So the collaborative layout has outlasted the strawman for which it was an illustration, and "pX" now refers to the p6 on p5 effort. ;)
One idea came to mind - we could drop the p5 t/op/{regexp,split,etc}.t tests in re::override, add a dependency on re::override::pcre, and run the tests against that. so people can see how well it is/isnt working and go from there.
but part of that was not looking forward to getting the regexp-engine sane. hmm. maybe most of the "put it off one milestone further" was. ;) maybe we can use override::pcre for that?
(nice to see regexp-spike performance against big input tested though:) (suprise,not - it's slow)
re harness, that works for re_tests (one regexp test per line, etc). not for the other files (lots of external (??{}) variable dependencies, etc, etc).
not really a useful test i think. the regexp engine and the rest of perl core are so intimate,
eg, some parts of core try to rewrite the regexp op codes depending on the context in which they are used. if those op codes are actually real, that works, and you never find out you've ceased to manage to dodge the rewriters attention.
eg, \G support is only provided when the regexp ask for it. the hook has to support this (doesnt yet). a p5re engine would just ask itself, and everything would work, but the hook isn't doing it.
r9536 | putter++ | move regexp/p6parser spike into a subdir of Common
r9537 | putter++ | rename Common/regexp-spike Common/regexp_and_parser_spike
Overriding with regexp-spike derivative, it still has a segfault 30~ tests into split. but if that's commented out, its a 65% pass. regexp.t is 82%. pat.t is problematic. aside from a lot of (??{}), scoping problems mean the utf8 _implementation_ uses the override, which the spike is too slow for.
on the up side, regexp.t, p5's re_test wrapper, just did a 80% pass. :)
yes but... the regexp struct on which this all hangs is really pawed at all over the place. this is a legacy system reverse engineering project. we really may get to the very end, everything looking like we're winning, and discover/come-to-fully-appreciate constraint X which says "you just can't do it without a patched perl source". And all of this is assuming this api-free reach deep into perl guts becomes so socially important that it
pdcawley: the re::override "perl literal regexp compilation and execution is handled with perl sub callbacks" is a very very crude development snapshot. it's just a hook, for connecting to some engine. audreyt can tell you about the PCRE engine, and fglock about the lrep engine.
out-of-band recap - Regexp-ReplaceEngine.pm has no checked in tests. just a couple of demos so anyone who perl -w'ed the file would get some positive feedback for their curiosity. To test it, I use mutant versions of the regexp engine spike and of some p5 t/op/ test files. all of which has been too unstable to be worth checking in.
folks might want to call pge via a regexp compilation hook, or via source filters, or just using normal method calls.
In fact, it was his magic which actually got me motivated to try the hook, by demonstrating there was more magic around than I thought, and that there was a potential fallback position for transparent regexp overriding on p5.
putter: so the src filter needs to construct a regexp that has the correct parens count
hmm... dont remember the caveats. started coding up a bsb-esque regexp generator, looking for it...
fglock, could I call gramar1::gramar a "huge regexp"?
r9350 | putter++ | It now also permits, hopefully reliably, a regexp implementation to eval() code in the lexical environment in which the regexp is being defined! This should allow proper implementation of (??{...}), and eventually, of p6 variable interpolation.
gaal: yes, and I haven't tested it to be sure, but I suspect the Parser.hs games, which provide for use/require parsing in the absence of a macro "is parsed(/some regexp or other/)", also get you compile time. maybe.
TimToady: could I borrow you for a reality check? I have the PL_regcompp / execp regexp engine intercept hooked into an alternate re engine written in p5 subs. And it's happily working/failing/segfaulting it's way through the p5 t/op/ test files. Next step is to get (??{...}) working. The only way of doing that which I know of is what the current
audreyt: re just plug PCRE in - PCRE would be a rather harder nut that the regexp-spike engine. in the spike, we go back to p5, and only use the real engine for the leaves. in PCRE we would have to deal with callbacks etc.
regexp compilation /foo/ and execution "a" =~ ... are callbacked to p5 subs, rather than being handled by the usual p5 engine.
and Regexp::... i dont know, compile hook... what are // etall called? regexp literals?
yeah, but compiling is what some regexp engine will do. this just gives you a callback which is called when an operator is defined/compiled.
who's the contact for the pluggable regexp engine for P5 somebody was hacking on ?
(as long as its a /x p5 regexp :)
r9334 | putter++ | Regexp-ReplaceEngine.pm - Can create qr//'s whose results are managed by a perl sub{} callback. And regexp creation /foo/ can be intercepted, in a scoped way (ie, a sub "regexp_compile" will be used, if one is visible). Next step is cleanup and stress testing, using a version of the regexp-spike. Should be able to run the whole p5 regexp test suite.
putter, where are the docs for the pluggable regexp engine in Perl 5?
r9297 | putter++ | Use of regexp struct is now much more principled. Exec hook return value now determines match success/failure. $1 support is still a work in progress.
$<digit> are extracted from offsets stored in the current lexical latest regexp match
fglock: re $1, it's not pretty. you set up a C struct named regexp, which lots of parts of perl assume they know the layout of parts of, and try to match their expectations sufficiently for them to do their usual (eg, set $1) thing. $1 is actually created, at regexp compile time, by something that looks at theregexp->nparens and adds that many $digit's. ;)
gaal: re Regexp-ReplaceEngine.pm, perl5 has C hooks to control which C func gets called when a regexp is compiled, and when executed. so you can swap in callbacks to p5 subs, in place of the original p5 regexp engine. Those subs can do something like the regexp-spike. Since the hooks themselves are reentrant... hmm...
they better be... something to check... one could transparently swap p5's normal engine for the regexp-spike, gaining reentrancy, and p5 subrules. It also provides a way to work around the longstanding p5 problem of not being able to create a QR object, which "overloads" =~ and is usable ob split,etc. It's a mechanism by which p6 code can pretend to be a native regexp. That,
expand Prelude.pm, p5 backend for Parse-YAML, Lexer/Parser.hs -> rules, pure p5 regexp-hook extendability, iterator_engine, (pause)
wandered off and played with some of perl5's hooks to intercept regexp creation/use, which together with source filters and the rest, might just barely be sufficient to pull off p6-like regexs, transparently, on p5-writen-by-humans. but not quite I think.
(used in the "I really should put a copy in common" somewhat cleaned up spike backtracking regexp engine thing - makes syntax errors in evals much clearer. )
ok, so is a correct one-liner "interator is like the regexp/parser spike, but bootstrapped to self hosting, and using p6 rather than p5 syntax regexps"? other key differences?
Regarding using an alternate regexp engine in perl5, I did a design space grovel - it would take something like four pages to write up - isn't going to happen unless someone _rteally_ wants it.
Oh, and the tech for playing with regexp's that I know about is, well, source filters of course, bsb's "regexp which calls matcher, gets capture offsets, and correctly sets up capture vars", the use re 'debug'-ish regcompp and regexecp hooks, and... I dont know of any other tools to get leverage around here, short of playing with opcodes (p5, and regex) or patch p5.
I should mention the hooks are basically providing a way to fake up regexp struct's, both as an api into the perl matchvar machinery, and as a way to do qr// which _is_ reentrant (the hooks are reentrant, its just the engine which isnt - thus the one downside of bsb's device).
though there are some low hanging fruit, like a reentrant p5 regexp engine with rules. that one could do now as an XS module for cpan. but that's a bit off-perl6-topic.
PerlJam: the regexp engine and the perl compiler/runtime unfortunately know each other quite well.
One of the key values of the hooks, and their ability to create "struct regexp"'s, is you finally have a way to twiddle some of those internals. But I'm afraid it may not go as far as "please, can my $1 be an SV?". :(
PerlJam: doing p5 guts modifications to better support p6-like stuff... is something which could be added to perltodo. ;) volunteers welcome. making the regexp engine reentrant is listed there. in the "hard" section (though I'm not clear on why it's hard exactly. but I dont see perlguts hacking in my future).
surfaces from perlguts, attempting to use the regexp hooks bsb++ turned up to swap in an alternate engine. that part (sort of) works. but setting up the regexp struct so $1,etc get set, even if you ignore leaking, blech. But ingy++ for Inline::C.
re crazy, it should actually be straightforward. two hooks, regcomp and regexec. comp creates a regexp struct which wont get optimized away (// works for my old 5.8), and overwrite a non-essential field in it with the pointer you wish to pass from comp to exec. in exec, do whatever, and set up the the capture offsets. only the captures arent showing up. :(
re bytecode, it's basically replacing the C function which does the regexp search, and (with the unclear but possible assistance of other parts of perl:( ), sets up the match values. So it should simply be a matter of copying out that functionality. But this is perlguts, which makes the simple incomprehensible. ;)
r9226 | putter++ | misc/pX/Common/Regexp-ReplaceEngine.pm created. Perl5 has hooks to permit interception of regexp compilation and execution. This is an attempt to use them. No-one really has yet. The objective is perl -we 'qr/a normal regexp/; {use Regexp::Perl6; qr/<ws>?<word>/}'. Extensive comments included. Callbacks to &regexp_hook_exec,etc subs work. Next big step is getting $1,etc to work. Volunteers encouraged. It is written in Inline:
r9229 | putter++ | Variable interpolation happens before the regexp compiler hook gets the pattern.So while this hook approach is fine for creating say a reentrant p5 engine, even one with rules, it can't handle p6 regexps with variables correctly.
basically Regexp::Parser+continuation-backtracker (cleaned up version of spike) + opparser +p5/6-regexp-op-grammar provides p5/6-regexp -> match/parse-tree, tweak to regexp-ast, then regexp-ast -> rules-using-continuation-backtracker. currently have or know how to easily do all the bits but the ooparser.
oh, objective is to bootstrap a self-hosting "correct"/dynamic p5+p6 regexp engine.
in order of depreciation: rules regex regexp regular-expression
in that context, the question is "what to do with pos?". you don't want a one-to-one relationship with string objects. you'd like to be able to choose here I'm using this pos, and their I'm using some other. simultaneously, without interference effects. you could do that in the... "high sugar" regexp approach, because in $_.pos, the $_ neednt really be the string, but rather a parser state wrapper. but that
should have just cleaned up the regexp spike, rather than aiming for clean&correct. a bridge too far.
re p5 regexp engine performance... finally getting some sane numbers now (ie, slow enough to seem plausible;), after taking into account Regexp::Parser optimizations. still looks adequate.
:foo is a "normal" modifier. its argument defaults to 1. so :foo is equiv :foo(1). in a regexp, you have to figure out where the modifier ends. eg, in /:perl5aa/, is that a :perl5aa modifier and no content, a :perl5a modifer on "a", a... etc. so you can either have the explicit argument list, or use :: which at the beginning of a group is a no-op, and can't be misinterpreted as part of the modifier.
random other thought: do each p5 thread get its own regexp engine? if so, one could hack around non-reentrancy. ;)
ok... so what grammars for p5 or p6 regexp do folks know of...? putter is boiling up a stew...
did anyone ever consider swapping out the regexp engine? or adding a second one (pragma'ed)?
in other amusements, an experimental p5 regexp engine, doing backtracking recursive descent parsing, built from Regexp::Parser, was passing much of the p5 re_tests. and supported <subrules>. so a throw-away toy p6 parser was built on it.
Unless I hit the dreaded yawning pit of yacks (not to be confused with the easily bypassed benign pit of yawning yacks), I hope to have the regexp-engine-become-parser architecture-definition spike wrapped up today. I hope.
So, the purpose of the "regexp engine built using recursive-descent p5 subs" spike, which has now been lengthened to a "and a p6 parser engine built on it, using p6's dynamic statement/expr/operator/token scheme", is to allay doubts that we could go this path and have acceptable time/space performance.
for the regexp engine, any p5 regexp (no lookbehind) and input string. for the parser... well, maybe that one's mine...
what's nice is that is skips past optimisations in the perl5 regexp engine :) So it really is testing backtracking on alternations. And P::RD sucks at it ;)
fglock: p6 source examples which require backtracking... not that I can think of. I've been approaching the parser part of the spike as an exercise in architecture, considering the backtracking aspect of things a "been there, done that, it's demonstrated by the underlying regexp backtracking tests". hmm...
my idea was - hand-write a parser that can read the regexp syntax, and then write the parser using itself - I'm seeing how far I can go
r9010 | putter++ | misc/pX/Common/regexp_try.pl added to let folks easily exercise the regexp engine. perl ./regexp_try.pl REGEXP [ STRING | --file FILENAME ]
fglock; you saw Regexp;;Parser? it's a "hand-write a parser that can read the regexp syntax
fglock: given /<a>+<b>/ and rule a :p5 { (?{local $x=34;}) } and rule b :p5 { (?{ say $x }) }... 34 should get say()ed. _if_ you are trying to do a fully p5/6 compatible regexp engine. but of course, one can do specialized engines which only handle a subset of the regexp constructs. eg, no embedded code. or mutant embedded code with different semantics.
the "say $x" is actually in the scope of the "local $x = 34". "down", even though it syntactically kind of looks like one has come back up from the "scope" containing the local() (the "(?{...})") before getting to the say$x. maybe the key idea is the brackets in (?{ }) are just part of the regexp token syntax. they are not a block.
fglock: so the pieces are... input stream, find :)... a cache of the current values of all the regexp nodes?? given iterators, why isnt the entire state either on in the iterators or the stack? sub ab { while($a=a()){while($b=b()){ yield combine($a,$b) }} } ?
I think I'm going to punt on non-infix fixities. So modulo any architectually-interesting bugs, both the regexp engine spike and parser, err, spike-extension, are now feature complete. Next there's the "minor matter" of groveling around for insight. And after that, back down to the bottom I think - clean backtracking api, then self-hosting regexp engine, then a new parser spike.
Any and all comments, criticisms, feedback, remarks, thoughts, observations, musings, etc, etc, on the regexp or parser files in misc/pX/Comment, would be most welcome. ;)
putter: re "a cache of the current values of all the regexp nodes" - the hash works like a "namespace" for rules
bsb: neat. i like "(?!)" as fail. one posibility which comes to mind is using Regexp::Genex for testing. rather than just testing one regexp/string pair, one could ask p5 to give a whole bunch of strings which are all supposed to match some regexp, and try them all. :)
bsb: oh, and I *very* impressed you manage a clean PASS, after two years, relying on both spec and unspec p5 regexp behavior. ;)
bsb: being able to do so would finally allow you to use a non-standard p5 regexp engine, without affecting client code. (or being silly with source filters)
putter: sorry, I so not a parser/regexp guy, I have no idea what you are talking about :)
(obviously, is currently just a p5 regexp engine using a rec-decent backtracking idiom...)
is looking for a heavily backtracking, large target string, regexp. any thoughts?
If someone was looking for a little CPAN module to write, using Regexp::Parser one can easily write a p5 to p6 regexp converter. just decorate the Objects.pm nodes with a conversion method. where conversion method is, well, quite trivial. the hardest things to do are figuring out RP, and wrapping it for cpan. ;)
that's the implementation of the regexp "a"
ayrnieu: continuations give you two things. one is later stuff in the regexp can fail, "pushing back". ie, saying, nope, I'm not happy, feel free to try again. you could get that in other ways, though they make the implementation more complex. but the second thing it gives you is composability. you can separately compile regexes, then
putter: are you planning on writing a rules compiler in p5 using regexp::parser?
r8969 | putter++ | misc/pX/Common/: Dropped off the pieces for the creation of a p5 regexp test suite. Volunteers? :)
r8972 | putter++ | regexp test suite fiddling
Anyway, the regexp current test status is 695/961 subtests failed, 27.68% okay. prove regexp_engine_demo.t.
r8974 | putter++ | [MINOR] Fixed regexp test suite. Implemented (?{...}) - no $/ or pos() inside yet, but local() vars should work correctly.
r8975 | putter++ | regexp engine progresses.
hopefully the visibility (eg, "what do you mean you didn't see luqui's draft regexp engine core in ext/Perl6/something/mumble/frotz?") and folks' disinclination to duplicate work would keep waste down.
I mean, if you're matching a very backtracking regexp against a 1k string...
In a regexp engine, very possibly if depth equates to backtracking
slow - the regexp engine should use exceptions extensively. it p5... ouch. rb is quite bad enough. (that's another cl add, but oh well). but actually, this is all just rationalization. the _real_ motivator is...
I guess I'm also thinking of a more bottom-up than top-down design approach. we know what, say, the declaration of \d for regexps should look like, it's in the spec. so writing a little macro and fleshing out all of the \X's can be done, without having any idea how that will get integrated into the regexp engine.
The regexp engine can just assume it's given a list of such things. and the final rxmumble:<X> macro can glue the two together.
I just found out that in perl 6 regexp's the [...] will not define a group of characters anymore. How can I do this (i.e.: define a group of chars to be matched) in Perl 6?
ie, under perl5 and it's regexp engine, or under perl6 and its pcre engine? (the syntax suggests we're not dealing with the p6 PGE rules engine)
japhy: as a side note, depending on how intensive your need is, both of the lisp engines mentioned are somewhat p5 regexp compatible, cl-ppcre more than the other, and claim to be significantly faster than p5 on many flavors of patterns.
japhy: yes. p6 regexp optimization is in it's infancy. hmm, or hasn't really been born quite yet. lots of fertile ground. perhaps in a month or so we'll have enough foundation working that you could help with it! :)
notes p6's try/fail is actually not that slow. so as soon as we have a regexp parse tree, (optimization is another todo item), some "directly emit p6" code might be interesting.
$ in regexp-speak :)
not that this is a regexp
luqui: we should chat sometime about regexp engine designs.
aufrank: -optc and -optl are ghc "meta"?options which determine which compilation phase the attached option gets applied to. -optc-foo gives the C compiler -foo, -optl-foo gives the linker -foo. The modification regexp apparently assumes "-"'s are not in the middle of things. I'll make it more picky. Thanks for the catch.
tewk: if you run "a=b\nc=d\n" ~~ / $<matches> := [ (\w) = \N+ ]* / you get a match object which doesnt have a <matches> entry. (So, <matches> isnt magic, just something set above in the regexp). That obviously isnt what the code is expecting. And the mess of ),),), make me wonder if this is another pairs issue...
wants to try again building a try/fail based p6 regexp engine in p6 real soon now...
re syntax highlighting... having anything but p6 generate the text map info looks basically a lost cause. the usual regexp approach... for p6!?!;)
For instance, consider goto. Now gone from p6(?). The JavaScript spec is written with intra-method control flow described as goto step N. One can regexp massage the spec into p6 code with goto's. But with p6 not having goto's, you now have to massage, by hand, most(?) of the methods.
Basically its a quick path to getting _full_ p6&p5 regexp support... and good performance... and utterly undeployable (requires swi prolog compiled with non-default arguments), so it's only good for grammar development.
clkao: re context, one trick I used to pass variables to prolog was to do an aggressive scan of the regexp string, noting anything that looked like a variable, and then pass a hash of variable name to... well, that doesnt help
but if you need prototype info, a (we dont have it yet) parser regexp modification, etc, to get the parse right, that's just not going to happen. simple cases work ok though, as long as the pil generator continues to be tolerant. it seems an inherent problem with an external pil generator.
it doesn't use a parse tree because php would be too slow for that, it's just a regexp hack
I don't think regexp will be dead even we have got rule. :-)
Su-Shee: hmm, IMHO, rule is much more powerful than regexp. But regexp is more expressive in simple case.
actually maybe you can help me by just porting a regexp:
still sounds cool. alot different than just straigh old regexp
Juerd: you can give a string to split (in p5); it is interpreted as a regexp.
I'm playing regexp games with my new p5 backend spike. but only in a controlled environment (a complete prelude)
one should regexp the code before handing it to pugs, add informational annotations on stuff that gets lost in something which doesnt (ie, method _hint_does_xxencodedclassxx(){}) and
the pil for 'class A{does B}' doesnt mention does'ing or B. so given p6 code, one regexp's the does'es, and adds an extra "method _im_a_hint_does_B(){}" which will make it out in the pil, and can be interpreted as "oh, there's a does() going by".
does not delight in having to CPS convert, and thus notably uglify, the prolog regexp engine so local() in repeated embedded code clauses works correctly. :/
ah, you mean for user extensions of the regexp grammar. yes.
and just to wrap up, lib/PIL/Run/Match.pm is the Match objects returned by regexp matching. It probably belongs in P6::Value.
then click the "use regexp' checkbox
use regexp is useful for non trivial things
regards other things on pause, the regexp stuff in pugs is wating on objects returned by functions not being mis-typed.
but not to worry about this stuff. p5 will soon have a p6 rules engine, and iblech is adding objects, and piljs runs fine linked with perl5, so there is now a path for the regexp stuff to progress...
I hope /ignore -regexp -pattern 'evalbot_[:alnum:]+' JOINS will work..
and http://www.cs.sfu.ca/people/Faculty/cameron/Teaching/384/99-3/regexp-plg.html Perl Style Regular Expressions in Prolog
my question probably obvious, but: whether *parrot* required for regexp's in pugs (rules are implemented via PGE)
I remember the first time I saw a regexp
whomever: status update on rules Parser hook - now passes all tests but for trans.t. The regexp in trans's Prelude.pm my sub expand never fires. Something for another day.
brentdax: re PCRE errors with p6 regexp - do you have a test case? (I ask hopefully:) It is my current obstacle on improving pugs rule support. But I've only seen it in mutant versions of pugs. I'd love a replicable example on unmodified pugs... Thanks! :)
r6096 | putter++ | lib/pugs/hack.pod: Added notes on testing regexp engines.
pos and the target string should really be accessible during the regexp (perhaps via $/) after (pos...) and perhaps before. during the regexp, the logical place is in the, err, whatever the regexp run state object was called, and could be accessed via $/. pos on string could be used elsewhen as an api, but blech. backwards compatibility. :( last time I looked, seemed need for more speccedness here.
non-parrot. definitely not hacking PGE, except now and then a bit around the edges. though PGE is now working well enough that it can run a grammar for regexp. so one can chuck the whole fake-match-trees-to-bootstrap pain.
a pcre backend provides a stable alternative when pge is having difficulties, and a prototype for backends to other languages (eg p5) which have a regexp library with something like named-subexpressions. second, an entirely p6 regexp engine helps languages (eg javascript) which lack even that, and also allows doing the full regexp spect in which
is just waiting for a :inline adverb to regexps, which allows subsequent code to backtrack back into the regexp. eg, rx:inline/foo/; bar; is rx/foo { bar }/ or something (I always have to recheck which code embedding is which;)
ie, you can modify the strings matched within the regexp.
Aankhen``: while code in regexp remains unsupported, another option is a rule munge_this { ... } rule foo { ... <munge_this> } and then just crawl over the match tree reassembling the string, but for nodes reached via $m<munge_this>, munging it.
ok. :) I noticed the js regexp model is regretably limited (no embedded code or named backrefs). A modified pcre backend will transliterate and unpack, but it looks like full rules support will have to wait for the next step, a pure p6 engine...
r5801, putter++ | In Parser.hs, created a new quoting level QB_Minimal between QB_No and QB_Single. Unlike No, it recognizes backslashed escapes of backslash and qfProtectedChar. Unlike Single, it doesn't reduce them (ie, to a backslash or to the qfProtectedChar). This appears to be what one wants for regexp literals. Previously, a parsed bs bs sp and bs sp were indistinguishable. The correct handling of odd quotes, eg rx?...\?...? , is less clear,
It's an incomplete grammar for p6 regexps. Once it runs, PGE will give us a regexp parse tree in the form of a Match tree.
keeps forgetting how breathtakingly _large_ the regexp language has become...
thinks a single regexp with full backtracking would work better
r5527, cdpruden++ | Typo fix; missing pipe in regexp
I've been hoping someone will flesh out the regexp grammar. There is a fuzziness currently, with incomplete implementations of a spec spread across documents and emails. I'm hoping having something concrete, a cannonical grammar{}, will seed progress.
Hmm, it doesnt have any documentation, does it. Sigh. The current Match class has some properties which get in the way of implementing regexp engines. MatchX is intended as a temporary alternative. Being p6 rather than a primative, it is easier to tweak. But as type coersion, stringification, etc, dont work too well, neither does that aspect of MatchX.
its easy to create a string containg the code for a couple of rules, and yank them out with a regexp, generating whatever form you need. thats how I dealt with grammars not yet existing.
is cleaning up a regexp grammar...
r4824, putter++ | modules/Grammars/rx_emit_examples.pl: some examples of doing regexp engine generation from rx grammar :parsetrees.
aside from the regexp support, the Perl 5 is optional as I recall
as the regexp is provided by PCRE
autrijus, chip: has anyone started work on a grammar{} for p6 regexp? Ie, the "regexp" rule of grammar Perl {}?
r4574, putter++ | Parse array and hash captures in regexp. Just because they are not implemented doesnt mean we have to parsefail. Untodoed tests.
the pieces I've banged on are: a regexp grammar{} (or rather a bunch of them;); p6 rx in p6 on pcre; p6 rx in p5 on p5 qr; and assorted tree and bootstrap cruft.
It isnt actually very hard to bootstrap a regexp parser. You just use a match tree (:parsetree or normal) as the parse tree, write the tree to whatever emitter, fake up an initial match tree, and your done.
putter: aye. but then p6rules isn't really a regexp.
two, a p6/pcre, or p5/qr, or p5/Parse::Regex, or PGE bootstrap, are posible ways to get to a boostrapped regexp grammar{} and parser.
Which leaves me tempted to simply roll a regexp grammar{}, using the limited vocabulary PGE currently understands, fake up a match tree, and do a p6-based tree-to-pir emitter. Then the existing pir parser can be tossed.
But for reference purposes, it really does turn out to be quite easy to bootstrap a regexp parser. The match-tree-as-parser-ast is quite nice. And regeps arent really that complex, from a parser point of view.
## The clever regexp bit is from cgi-lib.pl
all characters in regexp should be treated literally
as opposed to regexp-based highlighting
dude, you've got ^b's in your regexp... :)
autrijus: not really. you don't have to worry about distinguishing valid for invalid regexp, and
ruby's new regexp engine is also nice. (name escapes me at the moment)
ah, here's the ruby regexp engine http://www.geocities.jp/kosako3/oniguruma/
anyone know where there's a Tk perl regexp gui thing? (i found one a while back but lost the url)
r3168 (ninereasons++) -- better regexp for '=for COMMENT'
Somebody can tell me, how can I test rules with Pugs? It seems, that everybody can test it, but me. :) I have the latest Pugs and the latest Parrot, and got error if I run an easy regexp test...
I'm trying to understand a regexp test case m:words/.../. Has anyone heard of a ! ("shriek") modifying the interpretation of adjacent whitespace?
but because the regexp dialect was very outdated (unexpected:). Perhaps t/rules/Disabled should have been done first. Or brand new tests written.
does that regexp work?
it calls them perl/regexp
rather, !perl/regexp
but don't tell skew i used a regexp for that ;)
Maybe regexp? Can somebody check it? Revision 1962 is OK.
perhaps, PRD is good for regexp conversions?
hi .. is there a reason why a regexp does not capture anything with the x tag, and something just removing it ?
err, in qq:to/END/, is END a regexp? can one actually do qq:to/^ END/ ?
heredoc, regex (not regexp, not regular expression), sub (not subroutine)
pugs - 1455 - adding tests for split(<regexp>, "")
does \0 mean NUL in a regexp like it did in p5?
Corion: regexp support is spotty
regexp support is very limited right now
autrijus: any way to make split support regexp quickly?
ok, hopefully I can make the regexp Voodoo work in pugs :)
kungfuftr: my regexp fu is only so-so
Alias_: we have basic regexp matches with rx:perl5{}
and we have subst regexp s:perl5{}{}
the only part of read_string that we could not support is the split with the regexp
Config::Tiny is probably quite a nice little regexp test
for the regexp
autrijus: any timeline on the s/// perl5 regexp support?
if you can just grok how regexp are made that might help
any regexp people out there?
"regexp people"?
would this regexp !/^\.{1,2}\Z(?!\n)/s
hmm, seems we have regexp matching, but not substitution
the regexp stuff is kinda buggy
yes, why not make the regexp part capturing?
yes, but if you are going to use a regexp in there please use the double quotes
maybe the regexp can just be a part of the section
nothingmuch: you should look at Pod's X<> format. It is an alternative to your regexp fu.
nothingmuch: i have checked matching regexp in t/, but find nothing about it
tiw: regexp are not implemented
the multi-file regexp search and replace in my BBEdit text editor came in handy
Khisanth: you use ~~ for regexp matching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment