Here is a random snippet from the RST discourse treebank.
(Satellite 7-11 background
(Satellite 7-7 attribution Shevardnadze said)
(Nucleus 8-11 span
(Nucleus 8-8 span
it took Gorbachev's government four years)
(Satellite 9-11 purpose
(Nucleus 9-10 span
(Satellite 9-9 attribution to determine)
(Nucleus 10-10 span
that the station's location in Siberia violated the accord,))
(Satellite 11-11 comment
as Western arms-control officials have long
contended.))))
I'll assign the following labels to the EDUs
e1:[Shevardnadze said]
e2:[it took Gorbachev's government four years]
e3:[to determine]
e4:[that the station's location in Siberia violated the accord,]
e5:[as Western arms-control officials have long contended.]
Dropping out the root 'satellite', the EDU-span information, and shortening the EDUs to their names gives us this tree, which is actually a bit weird...
Root:background(
S:attribution:e1,
N:span(
N:span:e2,
S:purpose(
N:span(S:attribution:e3, N:span:e4),
S:comment(e5))))
One thing that I stumble upon… every… single… time I look at the RST corpus, is that their notation is (necessarily) confusing to deal with the fact that they have non-binary trees, and have to encode encoding multiple relation instances per node.
Something like R(N,S1,S2,..,Sn) wouldn't make sense in their corpus, so instead they have to write something more like:
root(span:N, R1:S1, R2:S2,..,Rn:Sn)
Which is not what you're expecting. I might be misrembering this, but assuming I have the right idea, we need to shuffle the notation around (this tree is already binary, so you can't see why the weirdness is needed):
attribution(
S:e1,
N:purpose(
N:e2,
S:comment(
N:attribution(S:e3, N:e4),
S:e5)))
I'm just to add some line numbers here and follow the derivation through as I currently understand it. Note that everything here is mononuclear, but I think the example is still instructive because it allows us to isolate the recursive tree translation aspect from the DT-reversibility question.
7. attribution(
6. S:e1,
5. N:purpose(
4. N:e2,
3. S:comment(
2. N:attribution(S:e3, N:e4),
1. S:e5)))
Now working line-by-line (but still bottom up, because you can commute the leaf node steps to the front of the queue), I show two notations for the translation, first the new links I would add to the DT, and second the DT being constructed:
-
S:e5 => tree: e5
-
N:attribution(S:e3, N:e4) =>
link: attribution(e4, e3) tree: e4(attribution:e3)
-
S:comment(2,1) =>
link: comment(e4, e3) # e4 being the head tree: e4(attribution:e3, comment:e5)
-
N:e2 => e2
-
N:purpose(4,3) =>
link: purpose(e2, e4) tree: e2(purpose:e4(attribution:e3, comment:e5))
-
S:e1 => e1
-
attribution(6,5) =>
link:attribution(e2, e1), tree: e2(attribution:e1, purpose:e4(attribution:e3, comment:e5))
Final link inventory:
attribution(e4, e3)
comment(e4, e5)
purpose(e2, e4)
attribution(e2, e1)
Or less compactly,
e1:[Shevardnadze said]
^
| attribution
|
+--e2:[it took Gorbachev's government four years]
|
purpose |
| e3:[to determine]
V ^
| | attribution
| |
+--e4:[that the station's location in Siberia violated the accord,]
|
| comment
v
e5:[as Western arms-control officials have long contended.]
So do I have the right idea here?