XML Viewer - p97-1070

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1070_metho.xml
Size: 8,465 bytes
Last Modified: 2025-10-06 14:14:40
<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1070">
  <Title>Representing Paraphrases Using Synchronous TAGs</Title>
  <Section position="4" start_page="0" end_page="517" type="metho">
    <SectionTitle>
2 Paraphrasing with STAGs
</SectionTitle>
    <Paragraph position="0"> Abeill~ notes that the STAG formalism allows an explicit semantic representation to be avoided, mapping from syntax to syntax directly. This fits well with the syntactic paraphrases described in this paper; but it does not, as Abeill@ also notes, preclude semantic-based mappings, with Shieber and Schabes constructing syntax-to-semantics mappings as the first demonstration of STAGs. Similarly, more semantically-based paraphrases are possible through an indirect application of STAGs to a semantic representation, and then back to the syntax.</Paragraph>
    <Paragraph position="1"> One major difference between use in MT and paraphrase is in lexicalisation. The sorts of mappings that Abeill~ deals with are lexically idiosyncratic: the English sentences Kim likes Dale and Kim misses Dale, while syntactically parallel and semantically fairly dose, are translated to different  syntactic structures in French; see Figure 1. The actual mappings depend on the properties of words, so any TAGs used in this synchronous manner will necessarily be lexicaiised. Here, however, the sorts of paraphrases which are used are lexically general: splitting off a relative clause, as in (2), is not dependent on any lexical attribute of the sentence.</Paragraph>
    <Paragraph position="2"> Related to this is that, at least between English and French, extensive syntactic mismatch is unusual, much of the difficulty in translation coming from lexical idiosyncrasies. A consequence for machine translation is that much of the synchronising of TAGs is between elementary trees. So, even with a more complex syntactic structure than the translation examples above, the changes can be described by composing mappings between elementary trees, or just in the transfer lexicon. Abeill~ notes that there are occasions where it is necessary to replace an elementary tree by a derived tree; for example, in Hopefully, John will work becomes On esp~re que Jean travaillera, hopefully (an elementary tree) matches on esp~re que (derived).</Paragraph>
    <Paragraph position="4"> The situation is more complex in paraphrasing: by definition, the mappings are between units of text with differing syntactic properties. For example, the mapping of examples (2a) and (2b) involves the pairing of two derived trees, as in Figure 2. In this case, both trees are derived ones. A problem with the STAG formalism in this situation is that it doesn't capture the generality of the mapping between (2a) and (2b); separate tree pairings will have to be made for verbs in the matrix clause which have complementation patterns different from that of the above examples; the same is true for verbs in the subordinate clause. For more complex matchings, the making and pairing of derived trees becomes combinatorially large.</Paragraph>
    <Paragraph position="5"> A more compact definition is to have links, of a kind different from the standard STAG links, between nodes higher in the tree. In STAG, a link between two nodes specifies that any substitution or adjunction occurring at one node must be replicated at the other. This new proposed link would be a summary link indicating the synchronisation of an entire subtree: more precisely, each subnode of the node with the summary link is mapped to the corresponding node in the paired tree in a synchronous depth-first traversal of the subtree. Naturally, this can only be defined for pairs of nodes which have the same structure 1 ; that is, in the context of paraphrasing, it is effectively a statement that the paired subtrees are identical. So, for example, a mapping between the nodes labelled VP1 in each of the trees of the example described above would be an appropriate place to have such a summary link: by establishing a mapping between each subnode of VP1, this covers different types of matrix clauses.</Paragraph>
    <Paragraph position="6"> Another feature of using STAGs for paraphrasing is that the links are not necessarily one-to-one. In the right-hand tree of the Figure 2 pairing, the subject NPs of both sentences are linked to NP1 of the left-hand tree; this is a statement that both resulting sentences have the same subject. This does not, however, change the properties in any significant way. 2 It is also useful to add another type of link which is non-standard, in that it is not just a link between nodes at which adjunction and substitution occur, but which represents shared attributes. It connects nodes such as the main verb of each tree, and indicates that particular attributes are held in common. For example, mapping between active and passive voice versions of a sentence is represented by the tree in Figure 3. The verb in the active version of (3) (broke) shares the attribute of tense with the auxiliary verb \be\, and the lexical component is shared with the main verb of the passive tree (bro null nodes of the node at the multiple end of an m:l link, each child node being exactly the same as the parent with fully re-entrant feature structures, with one link being systematically allocated to each child.</Paragraph>
    <Paragraph position="7">  ken), which takes the past participle form. This sort of link is unnecessary when STAGs are used in MT, as the trees are lexicalised, and the information is shared in the transfer lexicon. Since, with paraphrasing, the transfer lexicon does not play such a role, the shared information is represented by this new type of link between the trees, where the links are labelled according to the information shared.</Paragraph>
    <Paragraph position="8"> Hence, node 1/1 in the active tree has a TENSE link with node Vo in the passive tree, where tense is the attribute in common; and a LEX link with node I/1 in the passive tree, where the lexeme is shared. 3</Paragraph>
  </Section>
  <Section position="5" start_page="517" end_page="517" type="metho">
    <SectionTitle>
3 Notation
</SectionTitle>
    <Paragraph position="0"> In paraphrasing, the tree notation thus becomes fairly clumsy: as well as consuming a large amount of space (given the large derived trees), it fails to reflect the generality provided by the summary links. That is, it is not possible to define a mapping between two structures reflecting their common features if the structures are not, as is standard in STAG, entire elementary or derived trees. Therefore, a new and more compact notation is proposed to overcome these two disadvantages.</Paragraph>
    <Paragraph position="1"> The new notation has three parts: the first part uniquely defines each tree of a synchronous tree pair; the second part describes, also uniquely, the nodes that will be part of the links; the third part links the trees via these nodes. So, let variables X and Y stand for any string of argument types acceptable in tree names; for example, X could be nxlnx2 and Y nl. Then, for example, the tree for (2a) can be defined as the adjunction of a flN0nx0VX tree (generic relative clause tree, standing for, e.g., ~N0nx0Vnxlnx2) into an an0VY tree; the tree for (2b) can be defined as a conjoined S tree, having a parent Sm node and 2 child nodes an0VX and an0VY.</Paragraph>
    <Paragraph position="2"> s, s, Figure 3: Paraphrase with partial links The second part of the notation requires picking out important nodes. The identification scheme ~The determination of a precise set of link labels is future work.</Paragraph>
    <Paragraph position="3"> proposed here has a string comprising node labels with relations between them, signifying a relationship taken from the set {parent, child, left-sibling, right-sibling}, abbreviated {p, c, ls, rs}. The node NP1 of the left-hand tree of Figure 2 can then be described by the string NPpNPpSrpNIL; an associated mnemonic nickname might be T1 subjNP.</Paragraph>
    <Paragraph position="4"> The third part of the representation is then linking the nodes. Standard links are represented by an equal sign; other links are represented with the link type subscripted to the equal sign. Thus, for Figure 2, TlsubjNP=TfleftsubjNP, where T21eftsubjNP is NPpSrpSmpNIL for the right-hand tree.</Paragraph>
    <Paragraph position="5"> For a tabular representation using this notation, see Dras (1997a).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML