Skip to content

Instantly share code, notes, and snippets.

@kowey
Last active August 29, 2015 13:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kowey/9386716 to your computer and use it in GitHub Desktop.
Save kowey/9386716 to your computer and use it in GitHub Desktop.

[i] Original RST snippet and EDUS

Here is a random snippet from the RST discourse treebank.

      (Satellite 7-11 background
        (Satellite 7-7 attribution Shevardnadze said)
        (Nucleus 8-11 span
          (Nucleus 8-8 span
            it took Gorbachev's government four years)
          (Satellite 9-11 purpose
            (Nucleus 9-10 span
              (Satellite 9-9 attribution to determine)
              (Nucleus 10-10 span
                that the station's location in Siberia violated the accord,))
            (Satellite 11-11 comment
              as Western arms-control officials have long
              contended.))))

I'll assign the following labels to the EDUs

e1:[Shevardnadze said]
e2:[it took Gorbachev's government four years]
e3:[to determine]
e4:[that the station's location in Siberia violated the accord,]
e5:[as Western arms-control officials have long contended.]

[ii] Compact notation

Dropping out the root 'satellite', the EDU-span information, and shortening the EDUs to their names gives us this tree, which is actually a bit weird...

Root:background(
    S:attribution:e1,
    N:span(
      N:span:e2,
      S:purpose(
        N:span(S:attribution:e3, N:span:e4),
        S:comment(e5))))

[iii] Binarised compact notation

One thing that I stumble upon… every… single… time I look at the RST corpus, is that their notation is (necessarily) confusing to deal with the fact that they have non-binary trees, and have to encode encoding multiple relation instances per node.

Something like R(N,S1,S2,..,Sn) wouldn't make sense in their corpus, so instead they have to write something more like:

root(span:N, R1:S1, R2:S2,..,Rn:Sn)

Which is not what you're expecting. I might be misrembering this, but assuming I have the right idea, we need to shuffle the notation around (this tree is already binary, so you can't see why the weirdness is needed):

attribution(
  S:e1,
  N:purpose(
    N:e2,
    S:comment(
       N:attribution(S:e3, N:e4),
       S:e5)))

[iv] RST to DT derivation

I'm just to add some line numbers here and follow the derivation through as I currently understand it. Note that everything here is mononuclear, but I think the example is still instructive because it allows us to isolate the recursive tree translation aspect from the DT-reversibility question.

7. attribution(
6.   S:e1,
5.   N:purpose(
4.     N:e2,
3.     S:comment(
2.        N:attribution(S:e3, N:e4),
1.        S:e5)))

Now working line-by-line (but still bottom up, because you can commute the leaf node steps to the front of the queue), I show two notations for the translation, first the new links I would add to the DT, and second the DT being constructed:

  1. S:e5 => tree: e5

  2. N:attribution(S:e3, N:e4) =>

     link: attribution(e4, e3)
     tree: e4(attribution:e3)
    
  3. S:comment(2,1) =>

     link: comment(e4, e3)    # e4 being the head
     tree: e4(attribution:e3,
              comment:e5)
    
  4. N:e2 => e2

  5. N:purpose(4,3) =>

     link: purpose(e2, e4)
     tree: e2(purpose:e4(attribution:e3,
                         comment:e5))
    
  6. S:e1 => e1

  7. attribution(6,5) =>

     link:attribution(e2, e1),
     tree: e2(attribution:e1,
              purpose:e4(attribution:e3,
                         comment:e5))
    

Final link inventory:

attribution(e4, e3)
comment(e4, e5)
purpose(e2, e4)
attribution(e2, e1)

Or less compactly,

           e1:[Shevardnadze said]
           ^
           | attribution
           |
        +--e2:[it took Gorbachev's government four years]
        |
purpose |
        |  e3:[to determine]
        V  ^
        |  | attribution
        |  |
        +--e4:[that the station's location in Siberia violated the accord,]
           |
           | comment
           v
           e5:[as Western arms-control officials have long contended.]

So do I have the right idea here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment