kowey/gist:9386716

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    [i] Original RST snippet and EDUS

Here is a random snippet from the RST discourse treebank.
      (Satellite 7-11 background
        (Satellite 7-7 attribution Shevardnadze said)
        (Nucleus 8-11 span
          (Nucleus 8-8 span
            it took Gorbachev's government four years)
          (Satellite 9-11 purpose
            (Nucleus 9-10 span
              (Satellite 9-9 attribution to determine)
              (Nucleus 10-10 span
                that the station's location in Siberia violated the accord,))
            (Satellite 11-11 comment
              as Western arms-control officials have long
              contended.))))

I'll assign the following labels to the EDUs
e1:[Shevardnadze said]
e2:[it took Gorbachev's government four years]
e3:[to determine]
e4:[that the station's location in Siberia violated the accord,]
e5:[as Western arms-control officials have long contended.]

[ii] Compact notation

Dropping out the root 'satellite', the EDU-span information, and
shortening the EDUs to their names gives us this tree, which is
actually a bit weird...
Root:background(
    S:attribution:e1,
    N:span(
      N:span:e2,
      S:purpose(
        N:span(S:attribution:e3, N:span:e4),
        S:comment(e5))))

[iii] Binarised compact notation

One thing that I stumble upon… every… single… time I look at the RST
corpus, is that their notation is (necessarily) confusing to deal with
the fact that they have non-binary trees, and have to encode
encoding multiple relation instances per node.
Something like R(N,S1,S2,..,Sn) wouldn't make sense in their corpus, so
instead they have to write something more like:
root(span:N, R1:S1, R2:S2,..,Rn:Sn)

Which is not what you're expecting.  I might be misrembering this, but
assuming I have the right idea, we need to shuffle the notation around
(this tree is already binary, so you can't see why the weirdness is needed):
attribution(
  S:e1,
  N:purpose(
    N:e2,
    S:comment(
       N:attribution(S:e3, N:e4),
       S:e5)))

[iv] RST to DT derivation

I'm just to add some line numbers here and follow the derivation through
as I currently understand it. Note that everything here is mononuclear,
but I think the example is still instructive because it allows us to
isolate the recursive tree translation aspect from the DT-reversibility
question.
7. attribution(
6.   S:e1,
5.   N:purpose(
4.     N:e2,
3.     S:comment(
2.        N:attribution(S:e3, N:e4),
1.        S:e5)))

Now working line-by-line (but still bottom up, because you can commute
the leaf node steps to the front of the queue), I show two notations
for the translation, first the new links I would add to the DT,
and second the DT being constructed:


S:e5 => tree: e5


N:attribution(S:e3, N:e4) =>
 link: attribution(e4, e3)
 tree: e4(attribution:e3)


S:comment(2,1) =>
 link: comment(e4, e3)    # e4 being the head
 tree: e4(attribution:e3,
          comment:e5)


N:e2 => e2


N:purpose(4,3) =>
 link: purpose(e2, e4)
 tree: e2(purpose:e4(attribution:e3,
                     comment:e5))


S:e1 => e1


attribution(6,5) =>
 link:attribution(e2, e1),
 tree: e2(attribution:e1,
          purpose:e4(attribution:e3,
                     comment:e5))


Final link inventory:
attribution(e4, e3)
comment(e4, e5)
purpose(e2, e4)
attribution(e2, e1)

Or less compactly,
           e1:[Shevardnadze said]
           ^
           | attribution
           |
        +--e2:[it took Gorbachev's government four years]
        |
purpose |
        |  e3:[to determine]
        V  ^
        |  | attribution
        |  |
        +--e4:[that the station's location in Siberia violated the accord,]
           |
           | comment
           v
           e5:[as Western arms-control officials have long contended.]

So do I have the right idea here?