Skip to content

Instantly share code, notes, and snippets.

@meijeru
Created June 25, 2011 10:21
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save meijeru/1046347 to your computer and use it in GitHub Desktop.
Save meijeru/1046347 to your computer and use it in GitHub Desktop.
Reds lexer (drives separator + grammar)
REBOL [
Title: "Red/System lexical analysis"
Date: 1-Jul-2011
Name: "Reds lexer"
Type: none
Version: 1.0.0
File: %/G/Projects/Common/RED/red-system/sources/reds-lexer/reds-lexer.r
Home: http://users.telenet.be/rwmeijer
Author: "Rudolf W. Meijer"
Rights: "Copyright (C) 2011 Rudolf W. Meijer. All Rights Reserved"
History: [
0.0.0 [19-Jun-2011 {Start of project} "RM"]
0.5.0 [24-Jun-2011 {First working version} "RM"]
0.7.0 [27-Jun-2011 {Added file! and tuple! literals} "RM"]
0.8.0 [27-Jun-2011 {Simplified the separator} "RM"]
0.9.0 [29-Jun-2011 {
Separator reduced to stripping comments only,
Grammar takes care of whitespace
} "RM" ]
1.0.0 [1-Jul-2011 {Grammar takes care of comments also} "RM"]
]
]
;---|----1----|----2----|----3----|----4----|----5----|----6----|----7----|-
do %reds-lex-grammar.r
reds-lexer: func [
inp [file! url! binary! string!]
][
unless string? inp [
unless binary? inp [inp: read/binary inp]
inp: to-string inp
]
either empty? inp
[
print "nothing to analyse: empty input"
][
print "start"
; diagnostic
print
["parse" dt [
parse/all inp lex-grammar/program
]]
; parsed source is in lex-grammar/source
]
]
test-text:
%../../tests/source/units/exit-test.reds
print "call"
reds-lexer
;copy
test-text
; diagnostic
print mold/all head lex-grammar/source
ask ""
halt
@dockimbel
Copy link

Can it already parse existing Red/System sources, like %hello.reds, %common.reds, and all the unit test scripts?

Nice work, by the way!

@meijeru
Copy link
Author

meijeru commented Jun 27, 2011

I discovered I will have to put in files and tuples because they occur in headers; have found a few bugs also. Next version to come soon.

@meijeru
Copy link
Author

meijeru commented Jun 27, 2011

This (0.7.0) is now the version with tuples and files in, and it has parsed all unit test scripts. My only worry is that it is rather slow on the two long auto-test files (6000 and 10000 lines, respectively) where it took between 20 and 50 seconds. I fear it is O(n2) or worse (exponential?). Still, it may serve as basis for a better written lexer.

@meijeru
Copy link
Author

meijeru commented Jun 28, 2011

Most of the above-mentioned time is spent in the separator; the parse is fast (2.5 seconds for the longest file). Unfortunately, when Red is introducing more of the datatypes (and their literals) that we appreciate in REBOL, the separation process will become even more intricate. I did it for REBOL 3, and the code is twice as long as the one presented here for Red/System.

@meijeru
Copy link
Author

meijeru commented Jun 28, 2011

I have now signifiantly simplified the code for the Red/System case and replaced REBOL append and take by their native equivalents. the saving is huge! byte-auto-test.reds now separates in less than 5 seconds. The new versions 0.8.0 are up!

@dockimbel
Copy link

Nice progress!

@meijeru
Copy link
Author

meijeru commented Jun 29, 2011

I added the % operator.

@dockimbel
Copy link

I will review and test your lexer in depth, in a few days, to see if I can replace the current LOAD native with it. It would certainly open new options for better syntactic constructions of some literal datatypes (that would not be allowed by REBOL's scanner), so I am very interested in it, both for Red/System and Red (at least for the bootstrapping phase, once Red up and running, we would port the lexer to Red).

@meijeru
Copy link
Author

meijeru commented Jun 30, 2011

Please ignore any version prior to today 30-Jun-2011. I just simplified the "separator" into a comment-stripper and let the grammar take care of the whitespace. Previously, I did not think this could be done, that is why before today I chose the more complicated route which takes much longer unnecessarily..

@meijeru
Copy link
Author

meijeru commented Jul 1, 2011

I have now managed to eliminate even the comment-stripping: REBOL's parse is just more powerful than I suspected. So the grammar takes care of everything. See version 1.0.0 of 1-Jul.

@dockimbel
Copy link

Lexer and grammar files downloaded, having just a quick look now, will do all the testing tomorrow. Hope it could be used as a drop-in replacement to LOAD (or at least would not require too much work to do so). It looks very exciting anyway. :-)

@meijeru
Copy link
Author

meijeru commented Jul 2, 2011

I realize it is not geared to #include and #define, so that part will have to be (re-)done anyway.

@dockimbel
Copy link

A few comments from my first review session:

  1. It is sometime up to 50x slower than LOAD (using %tests/source/units/auto-tests/byte-auto-test.reds for example, I get 11ms with LOAD and 504ms with reds-lexer). It seems that a lot of time is spent in decode-string, could it be rewritten using parse rules instead? This is not a big issue at this point, it just shows how slow REBOL is...

  2. Using %tests/source/units/auto-tests/integer-auto-test.reds as input, I get the following error:

    ** Math Error: Math or number overflow
    ** Where: store-integer
    ** Near: pow: pow * 10
    
  3. The previous point makes me realize that reds-lexer is currently lacking error catching support, including proper reporting of the input position where the scanning failed (required to be able to generate accurate syntax error messages in Red/System). This point it really important for using it as a LOAD replacement.

  4. It will require significant work (maybe a day or two) to integrate reds-lexer in compiler. Mainly re-wiring properly, removing or replacing deferred syntax checking in compiler and extensive testing/fixing for regressions.

  5. Related to this work, I would be very interested in a formal Red/System syntax grammar specification (ideally using BNF format). Let me know if you are interested.

I am still interested in replacing LOAD with reds-lexer, but I guess it is not doable before going beta (announce planned for tomorrow). I guess that reds-lexer integration could be achieved probably later this week or next week.

Thank you for your nice work!

@meijeru
Copy link
Author

meijeru commented Jul 3, 2011

I do have a Red/System syntax grammar ready, as a Word file (BNF productions - albeit with ambiguity - and semantic comments). I will send it to your nr@red-lang.org address shortly.

@meijeru
Copy link
Author

meijeru commented Jul 3, 2011

This grammar does not consider the shift operators that were added just today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment