Skip to content

Instantly share code, notes, and snippets.

@meijeru
Last active March 18, 2018 10:32
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save meijeru/ce66cfaf6c4ef8add2ab2065541449a1 to your computer and use it in GitHub Desktop.
Save meijeru/ce66cfaf6c4ef8add2ab2065541449a1 to your computer and use it in GitHub Desktop.
REBOL [
Title: "Red concordance program"
Name: "concordance"
File: %red-concordance.r
Author: "Rudolf W. Meijer"
Rights: "Copyright (C) 2015-2018 Rudolf W. MEIJER. All Rights Reserved"
Needs: [2.7.6]
Tabs: 4
Purpose: {To provide supporting information for understanding the Red
toolchain by creating a dictionary of words occurring in the
sources in #define directives, alias definitions, globals,
#enum directives, #import/#export directives, context definitions,
function definitions, and "synonym" function names, e.g. keys-of,
as well as in the options definitions in the config file.
All words are categorized and annotated with the source file,
the full context, if any, and where appropriate, their value.
Output is to a comma-separated file which can be further exploited
using Excel or another program capable of interpreting such data.}
Comment: {This version has to cope with sources in Rebol and Red(/System).
That is why minor changes are needed to %lexer.r and some sources.
These are:
In %lexer.r, adapt the header-rule to accept any valid REBOL,
Red or Red/System header, and adapt the escape-rule to accept
any #[...] construct. Also suppress the quit/halt in throw-error
in order to have more information about further adaptations that
may be necessary. Store the adapted lexer as %red-lexer-adapted.r.
List any file that would still give a lexical error in the block avoid,
together with the offending element. Furthermore, list the files
that cannot or need not be analyzed, in the block ignore-sources}
History: [ ;version/date/comments
[0.0 29-May-2015 {Start of project}]
; the program has gone through several iterations
; which will not be documented here
[1.0 8-Feb-2018 {First release}]
[1.1 23-Feb-2018 {Download and unzip latest sources}]
]
Language: 'English
]
;---|----1----|----2----|----3----|----4----|----5----|----6----|----7----|-
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; preliminaries
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
red-latest-url: https://static.red-lang.org/dl/auto/red-latest-source.zip
; adapt the following constants for your situation
red-latest-local: %/C/Users/Eigenaar/AppData/Local/Temp/red-latest.zip
red-sources-dir: %/C/Users/Eigenaar/Projects/Red/sources/
fetch-sources-cmd: rejoin [
mold (to-local-file %"/C/Program Files (x86)/WinZip/wzunzip.exe") " -d -o "
(to-local-file red-latest-local) " "
(to-local-file red-sources-dir)
]
if "Y" = ask "Fetch latest sources? (Y/N) " [
print "fetching latest sources"
attempt [delete red-latest-local]
write/binary red-latest-local read/binary red-latest-url
print "unzipping sources"
call/wait fetch-sources-cmd
]
print "adapting lexer"
do %red-lexer-adapted.r
version: read rejoin [red-sources-dir %version.r] ; Red release version
vdate: modified? rejoin [red-sources-dir %version.r] ; date of installation
; program version is deduced from History field in header
conc-version-line: last system/script/header/history
conc-version: conc-version-line/1
conc-date: conc-version-line/2
; the version data are written to a separate output file
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; constants
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
source-extensions: [%.r %.red %.reds]
source-languages: ["Rebol" "Red" "Red/System"]
output-dict: %dictionary.csv
output-dict-version: %dictionary-version.txt
nl: "^/"
; The following files and folders are to be ignored from the Red sources
; as stored in the red-sources-dir folder, for the purpose of making the concordance
ignore-sources: [
%run-all.r
%docs/
%quick-test/
%system/config.r ; treated separately
%system/library/lib-iMagick.reds
%system/library/curses/
%system/library/dtoa/
%system/library/zlib/
%system/tests/
%system/utils/encap-fs.r
%system/utils/profiler.r
%system/utils/libRedRT-exports.r
%system/utils/r2-forward.r
%tests/
%utils/preprocessor.r
%version.r
]
; The following files each contain a lexical item that cannot be handled
; The line containing the item is commented out before the analysis
; and restored afterwards - see functions adapt-sources and restore-sources
avoid: [
; file lexical item
; currently no entries
]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; global variables
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; sources-list is a block of entries which themselves are blocks of four:
; extension + file-name + parent-dir (without red-sources-dir prefix) + parsed-text
sources-list: make block! 500
; sources-index is a block of three blocks (one for each of %.r, %.red, %.reds)
; each of which contains pairs of file-name + parent-dir (without red-sources-dir prefix)
sources-index: make block! 3
loop 3 [
insert/only tail sources-index make block! 300
]
; dictionary is a block of entries which are themselves blocks of six:
; - lemma (word found, as string)
; - category (#define, alias, global, #enum, #enumval, #import, #export, context,
; function, routine, action, native, operator, synonym e.g. for keys-of,
; global word, including in contexts, option from the config file
; - programming language (Rebol, Red, Red/System)
; - source-file
; - context, or target for options
; - value
dictionary: make block! 3000
; keep track of nested contexts
ctx-stack: make block! 5
; global data for recursive analysis procedure
source-file: none
source-dir: none
in-function: false
language: none
in-define: false
; counter for dictionary elements
nr-lemmas: 0
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; functions
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
adapt-sources: does [
foreach [file item] avoid [
file-text: read join red-sources-dir file
either all [
place: find file-text item
place: next find/reverse place "^/"
][
unless place/1 = #";" [
insert place #";"
write join red-sources-dir file file-text
]
][
print ["cannot adapt source" file "at" mold item]
]
]
]
restore-sources: does [
foreach [file item] avoid [
file-text: read join red-sources-dir file
either all [
place: find file-text item
place: find/reverse place ";"
][
remove place
write join red-sources-dir file file-text
][
print ["cannot restore source" file "at" mold item]
]
]
]
flatten: func [
{***********************************************************************
Remove new-line markers from block;
if /deep, recursively remove them from sub-blocks
***********************************************************************}
blk [block!]
/deep
][
new-line/all blk false
if deep [
forall blk [
if block? blk/1 [flatten/deep blk/1]
]
]
]
add-lemma: func [
{***********************************************************************
Make an entry into the dictionary block, using the three arguments
and adding the source-file programming language and context
from global information
************************************************************************}
lemma [string!] ; word found
content [string!] ; value or docstring
category [string!]
][
;adjust some lemmas in order to make MS-Excel accept them
if find "+-" first lemma [insert lemma #" "]
insert/only tail dictionary reduce [
lemma category language
form source-file
either empty? ctx-stack [""][mold to-path ctx-stack]
content
]
]
read-sources: func [
{***********************************************************************
Recursively check source folders and subfolders for files
and keep the .r, .red and .reds ones; parse their text and store it
for later analysis
***********************************************************************}
pdir [file!] files [block!] consider [logic!]
/local sdir dir ext parsed-text
][
sdir: find/tail pdir red-sources-dir ; short dir
if any [not consider find ignore-sources sdir] [
consider: false
]
sort files
foreach f files [
if #"/" <> last f [
if all [
consider
not find ignore-sources rejoin [sdir f]
ext: find/last f #"."
find source-extensions ext
][
source-file: f ; global!!
parsed-text: lexer/process read/binary rejoin [pdir f]
insert/only tail sources-list reduce [
ext
f
sdir
skip parsed-text 1 ; ignore header
]
]
]
]
foreach f files [
if #"/" = last f [
dir: rejoin [pdir f]
read-sources dir read dir consider
]
]
]
analyse: func [
{***********************************************************************
Recursively analyse a parsed program text and check for the following:
#define directives - store the defined name and the content
alias definitions - store the name and the type (struct! or function!)
#enum directives - store the enum name and the name/value pairs,
and make entries for the names (#enumval)
#import directives - store the function name and the OS library + entry name
context definitions - store the name
function definitions - store the name and the docstring if present
duplicate functions (e.g. key-of: :words-of) - store the names (synonym)
global variables (also inside contexts) - store the name (global) .
For all names, store the directly surrounding context if present.
The analysis assumes well-formed Rebol, Red and Red/System sources.
***********************************************************************}
prog [block!]
/local p wp d c len
name content
enumname enumvalues enumvalname
enumval enumval-s
imports exports
save-ctx
save-lang
category
][
p: prog
while [not tail? p][
case [
block? p/1 [
analyse p/1
p: skip p 1
]
; pattern with <word> <block> in Red/System: add context
all [
language = "Red/System"
'with = p/1
][
if all [
word? p/2
block? p/3
][
insert tail ctx-stack p/2
analyse p/3
clear back tail ctx-stack
p: skip p 2
]
p: skip p 1
]
; patterns #system <block> and #system-global <block>
all [
language = "Red"
any [
#system == p/1
#system-global = p/1
]
][
if block? p/2 [
language: "Red/System"
save-ctx: copy ctx-stack
clear ctx-stack
analyse p/2
ctx-stack: copy save-ctx
language: "Red"
p: skip p 1
]
p: skip p 1
]
; pattern #define <word> <value>
; pattern #define <word> (<args>) <value>
all [
language = "Red/System"
#define = p/1
][
if word? p/2 [
name: to-string p/2
if paren? p/3 [
name: rejoin [name mold p/3]
p: skip p 1
]
either all [
issue? p/3
#"'" = first to-string p/3
][
content: to-char to-integer copy/part at to-string p/3 8 2
][
content: p/3
]
case [
integer? content [
content: rejoin [
content " (" to-string to-hex content "h)"
]
]
block? content [
flatten/deep content
content: mold/all content
]
true [
content: mold content
]
]
add-lemma name content "#define"
p: skip p 2
]
p: skip p 1
]
; pattern #macro <set-word> func [<args>][<code>]
; pattern #macro <word|lit-word|block> func [<args>][<code>]
; note that currently, the toolchain source does not contain macros
all [
language = "Red"
#macro = p/1
][
either all [
any [set-word? p/2 word? p/2 lit-word? p/2 block? p/2]
; @@@@@@ to be checked @@@@@@
any ['func == p/3] ; or also 'function == p/3]
][
switch type?/word p/2 [
set-word! [
name: to-string p/2
]
word! [
name: rejoin ["$match-" p/2]
]
lit-word! [
name: rejoin ["$match-'" p/2]
]
block! [
name: "$match-[...]"
]
]
; @@@@@@ TBD doc-string?? @@@@@@
add-lemma name "" "#macro"
p: skip p 4
][
p: skip p 1
]
]
; pattern #enum <word> <block>
; the <block> is analyzed to obtain individual enum values
all [
language = "Red/System"
#enum = p/1
][
if all [word? p/2 block? p/3] [
enumname: to-string p/2
enumvalues: p/3
flatten enumvalues
add-lemma enumname mold/all enumvalues "#enum"
wp: enumvalues
enumval: 0
while [not tail? wp][
enumvalname: to-string wp/1
if set-word? wp/1 [
enumval: wp/2
wp: skip wp 1
]
enumval-s: rejoin [
enumval
" (" to-string to-hex enumval "h)"
" type " enumname
]
add-lemma enumvalname enumval-s "#enumval"
enumval: enumval + 1
wp: skip wp 1
]
p: skip p 2
]
p: skip p 1
]
; pattern #import <block>
; the block is analyzed to obtain individual function names
all [
language = "Red/System"
#import = p/1
][
if block? imports: p/2 [
foreach [lib conv funcs] imports [
if block? funcs [
foreach [name OS-string spec] funcs [
add-lemma to-string name
mold to-file rejoin [lib "/" OS-string]
"#import"
]
]
]
p: skip p 1
]
p: skip p 1
]
; pattern #export <block>
; pattern #export <stdcall|cdecl> <block>
all [
language = "Red/System"
#export = p/1
][
if any ['stdcall = p/2 'cdecl = p/2][p: skip p 1]
if all [
block? exports: p/2
foreach sym exports [either word? sym [true][break/return false]]
][
foreach sym exports [
add-lemma to-string sym "" "#export"
]
p: skip p 1
]
p: skip p 1
]
; pattern set <lit-word> <func|function|has|routine> [<args>][<code>]
; pattern set <lit-word> does [<code>]
; the function is defined in the global context
all [
language <> "Red/System"
'set == p/1
][
if all [
lit-word? p/2
any [
'func == p/3
'function == p/3
'does == p/3
'has == p/3
'routine = p/3
]
][
docstring: all [
'does <> p/3
'has <> p/3
block? p/4
not empty? p/4
string? first p/4
first p/4
]
if docstring [replace/all docstring nl " "]
either docstring
[
docstring: rejoin [{"} docstring {"}]
][
docstring: ""
]
category: either 'routine == p/3 ["routine"]["function"]
save-ctx: copy ctx-stack
clear ctx-stack
add-lemma to-string p/2 docstring category
ctx-stack: save-ctx
p: skip p either 'does = p/3 [3][4]
if block? p/1 [
; necessary because of nested function definitions
either in-function
[
analyse p/1
][
in-function: true
analyse p/1
in-function: false
]
]
]
p: skip p 1
]
; pattern #load set-word! <string> make op! <get-word>
; temporary work-around Rebol's limitations
all [
language = "Red/System"
#load = p/1
'set-word! == p/2
][
if all [
string? p/3
'make == p/4
'op! == p/5
get-word? p/6
][
add-lemma p/3 to-string p/6 "operator"
p: skip p 4
]
p: skip p 1
]
; patterns starting with a set-word
all [
set-word? name: p/1
not tail? next p
][
if 'make-profilable = p/2 [
p: skip p 1
]
case [
; pattern <set-word> alias <function!|struct!> <block>
all [
language = "Red/System"
'alias == p/2
][
add-lemma to-string name to-string p/3 "alias"
p: skip p 3
]
; pattern <set-word> context <block>
all [
'context == p/2
block? p/3
][
add-lemma to-string name "" "context"
insert tail ctx-stack to-word name
flatten ctx-stack
analyse p/3
clear back tail ctx-stack
p: skip p 3
]
; pattern <set-word> <func|function|has|routine> [<args>][<code>]
; pattern <set-word> does [<code>]
; <args> is analyzed for doc-string
any [
'func == p/2
'function == p/2
'does == p/2
'has == p/2
'routine == p/2
][
docstring: all [
'does <> p/2
'has <> p/2
block? p/3
not empty? p/3
string? first p/3
first p/3
]
if docstring [replace/all docstring nl " "]
either docstring
[
docstring: rejoin [{"} docstring {"}]
][
docstring: ""
]
category: either 'routine == p/2 ["routine"]["function"]
add-lemma to-string name docstring category
p: skip p either 'does = p/2 [2][3]
if block? p/1 [
; necessary because of nested function definitions
either in-function
[
analyse p/1
][
in-function: true
analyse p/1
in-function: false
]
]
p: skip p 1
]
; pattern <set-word> make <native!|action!|op!> [[<args>]<symbolic-number>]
; pattern <set-word> make op! <get-word>
; <args> is analyzed for doc-string
; pattern <set-word> make <native!|action!|op!> <get-word>
; doc-string is get-word
'make == p/2 [
if any [
'native! == p/3
'action! == p/3
'op! == p/3
][
category: head remove back tail to-string p/3
if category = "op" [category: "operator"]
case [
block? p/4 [
docstring: all [
not empty? p/4
block? p/4/1
string? first p/4/1
first p/4/1
]
if docstring [replace/all docstring nl " "]
either docstring
[
docstring: rejoin [{"} docstring {"}]
][
docstring: ""
]
]
all [
'op! == p/3
get-word? p/4
][
docstring: to-string p/4
]
true [
docstring: ""
]
]
unless docstring [docstring: ""]
add-lemma to-string name docstring category
p: skip p 3
]
p: skip p 1
]
; pattern <set-word> <get-word>
; is considered a synonym, but at the end, only those are kept
; where the <get-word> refers to a defined function etc.
; this cannot be done here
get-word? p/2 [
add-lemma to-string name to-string p/2 "synonym"
p: skip p 2
]
; pattern <set-word> <value>, not in function
not in-function [
category: either empty? ctx-stack["global"]["ctx-field"]
add-lemma to-string name "" category
; skip only 1, because of a: b: idiom
p: skip p 1
]
true [
p: skip p 1
]
]
true: [p: skip 1]
]
; pattern <set-path> context <block>
all [
set-path? p/1
not tail? next p
'context == p/2
][
len: length? p/1
if block? p/3 [
insert tail ctx-stack copy/part to-block p/1 len - 1
add-lemma to-string last p/1 "" "context"
insert tail ctx-stack to-word last p/1
flatten ctx-stack
analyse p/3
loop len [
clear back tail ctx-stack
]
p: skip p 2
]
p: skip p 1
]
true [p: skip p 1]
]
]
]
normalize-file: func [
{***********************************************************************
normalize a file value by removing . and .. symbols
***********************************************************************}
pdir [file!] file [file!]
/local save-dir res
][
save-dir: what-dir
change-dir rejoin [red-sources-dir pdir]
res: find/tail clean-path file red-sources-dir
change-dir save-dir
res
]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; main
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
t-start: now/precise
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; remove lines containing lexical items that adapted lexer cannot handle
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
print "adapting REBOL sources ..."
adapt-sources
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; recursively read and store all source file names
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
print "reading sources"
read-sources red-sources-dir read red-sources-dir true
sources-by-extension: copy sources-list
print "making index"
foreach f sources-by-extension [
ext: index? find source-extensions f/1
insert/only tail pick sources-index ext next f
]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; treat files one by one and find defines, enums and contexts etc.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
print "analysing source files"
repeat i 3 [
language: source-languages/:i
foreach f sources-index/:i [
source-file: rejoin [f/2 f/1] ; global!!!
source-dir: f/2 ; global!!!
parsed-text: f/3
clear ctx-stack ; global!!!
analyse parsed-text
]
]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; fetch option names and values
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
print "analysing config file"
language: "Rebol"
source-file: %system/config.r
parsed-text: lexer/process read/binary rejoin [red-sources-dir source-file]
foreach [target options] next parsed-text [
foreach [option-name option-value] options [
either integer? option-value
[
option-value: rejoin [option-value
" (" to-string to-hex option-value "h)"
]
][
option-value: mold option-value
]
insert/only tail dictionary reduce [
to-string option-name
"option" language form source-file form target
option-value
]
]
]
print [length? sources-list " + 1 files read"]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; show dictionary
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
attempt [delete output-dict]
attempt [delete output-dict-version]
write/lines/append output-dict-version "Searching in the dictionary of selected identifiers in the Red toolchain"
write/lines/append output-dict-version rejoin [
"Produced on " now
" by concordance version " conc-version
" of " conc-date
]
write/lines/append output-dict-version rejoin [
"Red toolchain version " version
" of " vdate/date "/" vdate/time
]
print "making dictionary"
dictionary: unique dictionary ; to remove duplicate globals
sort dictionary
write/lines/append output-dict "lemma,category,language,source file,context/target,value/docstring"
nr-lemmas: 0 ; need to count explicitly since some pseudo synonyms will be removed
foreach lemma dictionary [
either "synonym" = lemma/2
[
go-ahead: false
; check if this concerns a function
dict: dictionary
while [all [
not go-ahead
dict
lem: dict/1
]
][
either all [
lemma/6 = lem/1
find ["function" "routine" "native" "action" "operator"] lem/2
][
go-ahead: true
][
dict: next dict
]
]
][
go-ahead: true
]
if go-ahead [
write/lines/append output-dict rejoin
reduce [lemma/1 "," lemma/2 "," lemma/3 "," lemma/4 "," lemma/5 "," lemma/6]
nr-lemmas: nr-lemmas + 1
]
]
t-end: now/precise
restore-sources
write/lines/append output-dict-version rejoin ["Total number of lemmas: " nr-lemmas]
ask rejoin ["done in " to-integer t-end/time - t-start/time * 1000 " msec"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment