Skip to content

Instantly share code, notes, and snippets.

@jddurand
Created December 20, 2013 05:55
Show Gist options
  • Save jddurand/8050950 to your computer and use it in GitHub Desktop.
Save jddurand/8050950 to your computer and use it in GitHub Desktop.
Marpa literal() and UTF8 flag
#!env perl
use strict;
use diagnostics;
use Marpa::R2;
use Devel::Peek;
my $grammar_source = do {local $/; <DATA>};
my $input = "\x{2665}";
print "INPUT:\n--------------------\n";
Dump($input);
my $grammar = Marpa::R2::Scanless::G->new({ source => \$grammar_source});
my $re = Marpa::R2::Scanless::R->new( { grammar => $grammar } );
$re->read(\$input);
my ($start, $length) = $re->g1_location_to_span($re->current_g1_location());
my $char = $re->literal($start, $length);
print "recce->literal\n--------------------\n";
Dump($char);
print "recce->range_to_string\n--------------------\n";
my $test = $re->range_to_string($re->last_completed_range('test'));
Dump($test);
__DATA__
:start ::= test
test ::= char
char ~ [\x{2665}]
:lexeme ~ <char> pause => after event => 'char'
@jddurand
Copy link
Author

Output:

INPUT:
--------------------
SV = PV(0x851f770) at 0x8631700
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0x85ae240 "\342\231\245"\0 [UTF8 "\x{2665}"]
  CUR = 3
  LEN = 12
recce->literal
--------------------
SV = PV(0x8b12718) at 0x85386d4
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK)
  PV = 0x8db4738 "\342\231\245"\0
  CUR = 3
  LEN = 12
recce->range_to_string
--------------------
SV = PV(0x8b2c1a0) at 0x85c688c
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0x8df59e8 "\342\231\245"\0 [UTF8 "\x{2665}"]
  CUR = 3
  LEN = 12

@jeffreykegler
Copy link

That was definitely a bug and may well explain all the other issues. I think I've found the fix. Here's the output from my latest developer's version.

INPUT:
--------------------
SV = PV(0xa094c00) at 0xa1c6a10
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0x9fe4de0 "\342\231\245"\0 [UTF8 "\x{2665}"]
  CUR = 3
  LEN = 4
recce->literal
--------------------
SV = PV(0xa6622b8) at 0xa0022e0
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0xa558fe8 "\342\231\245"\0 [UTF8 "\x{2665}"]
  CUR = 3
  LEN = 4
recce->range_to_string
--------------------
SV = PV(0xa6622d8) at 0xa1d8190
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0xa6553b0 "\342\231\245"\0 [UTF8 "\x{2665}"]
  CUR = 3
  LEN = 4

@jddurand
Copy link
Author

Thanks!
I suppose a test suite checking the flag using is_utf8() could be added.
Thanks / JD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment