Skip to content

Instantly share code, notes, and snippets.

----------------------------------------------------
[subbu@earth tests] cat /tmp/refs
X <ref>This is a long ref</ref>
A <ref name=a />
B <ref name=b />
<references>
<ref name=a>foo</ref>
<ref name=b>bar</ref>
</references>
----------------------------------------------------
Updated handling for templated attributes (links, tables, html tags)
typeof: mw:ExpandedAttrs
data-mw = {
"expandedKeys": {
"href": { "html": "..." },
},
"expandedVals": {
"href": { "html": "..." },
Test #1:
{{echo|b
}}{|{{Infobox ship begin}}
|-
|a
|}
Test #2:
-- without es6 collections ---
starting parsing of fr:Coupe_du_pays_de_Galles_de_football
completed parsing of fr:Coupe_du_pays_de_Galles_de_football in 12058 ms
starting parsing of fr:Coupe_du_pays_de_Galles_de_football
completed parsing of fr:Coupe_du_pays_de_Galles_de_football in 4165 ms
starting parsing of fr:Coupe_du_pays_de_Galles_de_football
completed parsing of fr:Coupe_du_pays_de_Galles_de_football in 9414 ms
starting parsing of fr:Coupe_du_pays_de_Galles_de_football
completed parsing of fr:Coupe_du_pays_de_Galles_de_football in 4537 ms
ssastry@parsoid-spof:/data/project/parsoid/js/api$ tail -5000 nohup.out | grep parsing| sort | less
...
completed parsing of ms:Cacaven in 101054 ms
completed parsing of ms:Cacaven in 102503 ms
completed parsing of ms:Cacaven in 104788 ms
completed parsing of ms:Cacaven in 111757 ms
completed parsing of ms:Cacaven in 113038 ms
completed parsing of ms:Cacaven in 114799 ms
completed parsing of ms:Cacaven in 116453 ms
completed parsing of ms:Cacaven in 128306 ms
[subbu@earth lib] echo "foo<noinclude>bar baz" | node parse --fetchConfig false
<body data-parsoid='{"tmp":{},"dsr":[0,22,0,0]}'><p data-parsoid='{"dsr":[0,21,0,0]}'>foo<meta typeof="mw:Includes/NoInclude" data-parsoid='{"src":"<noinclude>","dsr":[3,14,null,null]}'>bar baz</p>
</body>
[subbu@earth lib] echo "foo</noinclude>bar baz" | node parse --fetchConfig false
<body data-parsoid='{"tmp":{},"dsr":[0,23,0,0]}'><p data-parsoid='{"dsr":[0,22,0,0]}'>foo&lt;/noinclude&gt;bar baz</p>
</body>
[subbu@earth lib] echo "foo</noinclude>bar baz" | node parse --fetchConfig false --trace peg-tokens
TOKS: ["foo",{"type":"EndTagTk","name":"noinclude","attribs":[],"dataAttribs":{"tsr":[3,15],"stx":"html"}},"bar baz"]
TOKS: [{"type":"NlTk","dataAttribs":{"tsr":[22,23]}},""]
TOKS: [{"type":"EOFTk"}]
subbu@earth:~/work/wmf/Parsoid/js/lib$ ls -lt /tmp/*wt* | grep -v tsp
-rw-rw-r-- 1 subbu subbu 381378 Oct 10 13:35 /tmp/wt0
-rw-rw-r-- 1 subbu subbu 381378 Oct 10 13:26 /tmp/wt3
-rw-rw-r-- 1 subbu subbu 381354 Oct 10 13:26 /tmp/wt2
-rw-rw-r-- 1 subbu subbu 381338 Oct 10 13:25 /tmp/wt1
-rw-rw-r-- 1 subbu subbu 1780653 Oct 10 13:13 /tmp/wt.debug3.html
-rw-rw-r-- 1 subbu subbu 2276378 Oct 10 12:16 /tmp/wt.debug2.html
-rw-rw-r-- 1 subbu subbu 2380992 Oct 10 12:09 /tmp/wt.debug1.html
-rw-rw-r-- 1 subbu subbu 2116581 Oct 10 11:59 /tmp/wt.debug.html
[subbu@earth lib] node parse --prefix mw --dump dom:pre-dsr < /tmp/x
------ DOM: pre-DSR -------
<head data-parsoid="{&quot;tmp&quot;:{}}"></head><body data-parsoid="{&quot;tmp&quot;:{}}"><meta typeof="mw:Transclusion" about="#mwt1" data-mw-arginfo="{&quot;dict&quot;:{&quot;target&quot;:{&quot;wt&quot;:&quot;Test for noincludes&quot;,&quot;href&quot;:&quot;./Template:Test_for_noincludes&quot;},&quot;params&quot;:{}},&quot;paramInfos&quot;:[]}" data-parsoid="{&quot;tsr&quot;:[0,23],&quot;src&quot;:&quot;{{Test for noincludes}}&quot;,&quot;a&quot;:{&quot;id&quot;:null},&quot;sa&quot;:{&quot;id&quot;:&quot;mwt1&quot;},&quot;tagId&quot;:1,&quot;tmp&quot;:{}}"><div data-parsoid="{&quot;stx&quot;:&quot;html&quot;,&quot;tagId&quot;:2,&quot;tmp&quot;:{}}">TEST</div>
<p data-parsoid="{&quot;tagId&quot;:3,&quot;tmp&quot;:{}}"><meta typeof="mw:Transclusion/End" about="#mwt1" data-parsoid="{&quot;tsr&quot;:[null,23],&quot;tagId&quot;:4,&quot;tmp&quot;:{}}">Blah number 5</p>
<p data-parsoid="{&quot;tagId&quot;:5,&quot;tm
[subbu@earth tests] ./sync-parserTests.js ../../../core/ pt-sync
Parsoid git HEAD is e7b19984370b734f058abdc3ecb47582e734f71d
>>> cd ../../../core/
>>> git fetch origin
remote: Counting objects: 15665, done
remote: Finding sources: 100% (2594/2594)
remote: Getting sizes: 100% (677/677)
remote: Compressing objects: 99% (12696/12697)
remote: Total 2594 (delta 1671), reused 2239 (delta 1639)
Receiving objects: 100% (2594/2594), 14.75 MiB | 1.02 MiB/s, done.
[subbu@earth tests] node parse --html2wt < /tmp/xyz.html
{|
<th>k6cetj5kudkuik9</th>!!''b''
|''a''||''b''
|}
[subbu@earth tests] cat /tmp/xyz.html
<body data-parsoid="{&quot;dsr&quot;:[0,25,0,0]}"><table data-parsoid="{&quot;dsr&quot;:[0,25,2,2]}">
<tbody data-parsoid="{&quot;dsr&quot;:[3,23,0,0]}"><tr data-parsoid="{&quot;autoInsertedEnd&quot;:true,&quot;autoInsertedStart&quot;:true,&quot;stx&quot;:&quot;html&quot;,&quot;dsr&quot;:[3,22,0,0]}"><th>k6cetj5kudkuik9</th><th data-parsoid="{&quot;stx_v&quot;:&quot;row&quot;,&quot;autoInsertedEnd&quot;:true,&quot;dsr&quot;:[7,12,2,0]}"><i data-parsoid="{&quot;autoInsertedEnd&quot;:1,&quot;dsr&quot;:[9,12,2,0]}">b</i></th>
<td data-parsoid="{&quot;autoInsertedEnd&quot;:true,&quot;dsr&quot;:[13,17,1,0]}"><i data-parsoid="{&quot;autoInsertedEnd&quot;:1,&quot;dsr&quot;:[14,17,2,0]}" data-foobar="dg2oiz9jujev1jor">a</i></td><td data-parsoid="{&quot;stx_v&quot;:&quot;row&quot;,&quot;autoInsertedEnd&quot;:true,&quot;dsr&quot;:[17,22,2,0]}"><i data-parsoid="{&quot;autoIns