Skip to content

Instantly share code, notes, and snippets.

@ravbell
Last active January 11, 2019 18:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ravbell/d94b37f1a346a1f73b5a827d9eaf7c92 to your computer and use it in GitHub Desktop.
Save ravbell/d94b37f1a346a1f73b5a827d9eaf7c92 to your computer and use it in GitHub Desktop.
use v6;
#use Grammar::Tracer;
grammar invoice {
token ws { \h*};
token super-word {\S+};
token super-phrase { <super-word> [\h <super-word>]*}
token line {^^ \h* [ <super-word> \h+]* <super-word>* \n};
token invoice-prelude-start {^^'Invoice Summary'\n}
token invoice-prelude-end {<line> <?before 'Start Invoice Details'\n>};
rule invoice-prelude {
<invoice-prelude-start>
<line>*?
<invoice-prelude-end>
<line>
}
}
multi sub MAIN(){
my $t = q :to/EOQ/;
Invoice Summary
asd fasdf
asdfasdf
asd 123-fasdf $1234.00
qwe {rq} [we-r_q] we
Start Invoice Details
EOQ
say $t;
say invoice.parse($t,:rule<invoice-prelude>);
}
multi sub MAIN('test'){
use Test;
ok invoice.parse('Invoice Summary' ~ "\n", rule => <invoice-prelude-start>);
ok invoice.parse('asdfa {sf} asd-[fasdf] #werwerw'~"\n", rule => <line>);
ok invoice.parse('asdfawerwerw'~"\n", rule => <line>);
ok invoice.subparse('fasdff;kjaf asdf asderwret'~"\n"~'Start Invoice Details'~"\n",rule => <invoice-prelude-end>);
ok invoice.parse('fasdff;kjaf asdf asderwret'~"\n"~'Start Invoice Details'~"\n",rule => <invoice-prelude-end>);
done-testing;
}
@ravbell
Copy link
Author

ravbell commented Jan 11, 2019

The parse on line 37 returns Nil. Do not understand why. Any ideas? All the individual tests for the tokens pass when you run the main with 'test' argument. Not sure what am I missing.

@b2gills
Copy link

b2gills commented Jan 11, 2019

TLDR: The issue is that the test input line with Start Invoice Details  ends with horizontal whitespace that you aren't dealing with.

Two ways to deal with it (other than changing the input)

# Explicitly:                                                       vvv
token invoice-prelude-end { <line> <?before 'Start Invoice Details' \h* \n>}

# Implicitly:
rule  invoice-prelude-end { <line><?before 'Start Invoice Details' \n>}
# ^ must be a rule                      and there must be a space ^
# (uses the fact that you wrote your own <ws> token)

Following are some more things that I think would be helpful

I would have used the “separated by” feature % in line and super-phrase

token super-phrase { <super-word>+ % \h } # single % doesn't capture trailing separator

token line {
  ^^ \h*
  <super-word>* %% \h+ # double %% can capture optional trailing separator
  \n
}

Those are [almost] exactly equivalent to what you wrote.
(What you wrote has to fail to match <super-word> twice in <line>, but this only has to fail once.)


I would have used the surround feature ~ in invoice-prelude

token invoice-prelude {
    # zero or more <line>s surrounded by <invoice-prelude-start> and <invoice-prelude-end>
    <invoice-prelude-start> ~ <invoice-prelude-end> <line>*?

    <line> # I assume this is here for debugging
}

Note that it didn't actually gain anything by being a rule because all of the horizontal whitespace is already handled by the rest of the code.


I don't think that the last line of the invoice prelude is special, so remove <line> from invoice-prelude-end.
(<line>*? in invoice-prelude will capture it instead.)

token invoice-prelude-end {<?before 'Start Invoice Details' \h* \n>}

The only regexs that could benefit from being a rule is invoice-prelude-start and invoice-prelude-end.

rule  invoice-prelude-start {^^ Invoice Summary \n}
# `^^` is needed  so the space ^ will match <.ws>

rule  invoice-prelude-end {<?before ^^ Start Invoice Details $$>}

That would only work if you are fine with it matching something like      Invoice    Summary    ␤.

Note that invoice-prelude-start needs to use \n to capture it, but invoice-prelude-end can use $$ instead because it isn't capturing \n anyway.


If you change super-word to something other than \S+, then you may also want to change ws to something like \h+ | <.wb>. (word boundary)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment