File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-1019_intro.xml

Size: 7,838 bytes

Last Modified: 2025-10-06 14:05:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1019">
  <Title>Connectivity in Bag Generation</Title>
  <Section position="2" start_page="0" end_page="103" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Bag generation is a form of natural language gel&gt; er;ttion in which the input is ;~ bag (Mso known as a inultiset: a set in which rcpe~ted elements are significant) of lexicM elements and the output is a grammatical sentence or a statistically most probable permutation with respect to some. bmguage model.</Paragraph>
    <Paragraph position="1"> Bag generation has been considered within the st~tistieal and rule-based paradigms of computational linguistics, and catch has handled this problem differently (Chen and Lee, 1994; Whitelock, 1994; Popowich, 1995; Tn0illo , 1995). This paper only considers ruh' based approaches to this problem.</Paragraph>
    <Paragraph position="2"> Bag generation has received particulm: attention in lexicalist approaches to MT, as exemplitied by Shake-and-Bake generation (Beaven, 1992; Whitelock, 1994). One can also envisage applications of bag generation to generation fi'om mini*Now at S\[1ARP L~tboral, ories of I&amp;quot;mrope, Oxh)rd Science \[)ark, Oxford OX4 4CA. E-ma~il: simon~sh~Lrp.co, nk tmdly recursiw', semantic ropresentactions (Cope.stake ct al., 1995) and other semantic fi'ameworks which separate scoping fi'om content information (l{eyle, 1995). ht these frameworks, the unordered natllFe ()f predicate or relation sets makes the aI&gt; plict~tion o\[' bag generation techniques attra.ctiw:. A notational convention used in the I)al)er is that items such as 'dogt' stand for simplitied lexical signs of the. form (Shieber, 198(0:</Paragraph>
    <Paragraph position="4"> In such signs, the semantic argument will be referred to as an qndex' and will be shown as n subscril)t to a lexeme; in the above exmnple, the index has been giwm the unique type 1.</Paragraph>
    <Paragraph position="5"> The term index is borrowed rl'Olll IIPSG (Pollard and Sag, 1994) where indices ~u'e used as arguments to relations; however these indices tnay also be equated with dis(-onrse referents in l)lt:I' (Kamp and I{eyle, 1993). As with most lexicalist generators, semantic variables ttttlSl; \[)c distinguished in order to disallow tr;mslationally incorrect permutations of the target bag. We distinguish variables by uniquely typing them.</Paragraph>
    <Paragraph position="6"> Two assumptions are made regarding \[cxiealsemantic indexing.</Paragraph>
    <Paragraph position="7"> Assmnption 1 All lea'teal signs must be indexed, including fltnetional and nonprcdicative elements (Calder cl al., 1989).</Paragraph>
    <Paragraph position="8"> Assumption 2 All le.~ical signs must be connecled to each other. 7'wo lea:ical signs arc connected if they are directly connected; furthermore, the connectivity rclation is h'an.silivc.</Paragraph>
    <Paragraph position="9"> Definition 1 7'wo signs, A, 11, are directly connccled if there cxisl at least two paths, PathA, Palht3, such that A:PathA is token identical with B:PathB.</Paragraph>
    <Paragraph position="10"> The indices involved in determining connectivity arc; specified as pa.rameters for a pro._ ticul;tr formalism, l'k)r exanq)le, in tlPSG,  play a major role in preventing the generation of incorrect translations.</Paragraph>
    <Paragraph position="12"> 1: Simple unification grammar.</Paragraph>
    <Paragraph position="13"> It will be shown that it is possible to exploit the connectivity Assumption 2 above in order to achieve a reduction in the number of redundant wfss constructed by both types of generator described in section 2.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Using Connectivity for Pruning
</SectionTitle>
      <Paragraph position="0"> Take the following bag: Ex. 2 {dogl,thcl,brown:,big:} (corresponding to 'the big brown dog'). Assume that the next wfss to be constructed by the generator is the NP 'the dog'. Given the grammar in Figure 1, it is possible to deduce that 'brown' can never be part of a complete NP constructed from such a substring. This can be determined as follows. If this adjective were part of such a sentence, 'brown' would have to appear as a leaf in some constituent that combines with 'the dog' or with a constituent containing 'the dog'. From the grammar, the only constituents that can combine with 'dog' are VP, Vtra and P. However, none of these constituents can have 'brownl' as a leaf: ill the case of P and Vtra this is trivial, since they are both categories of a ditferent lexical type. In the case of the VP, 'brownl' cannot appear as a leaf either because expansions of the VP are restricted to NP complements with 2 as their semantic index, which in turn would also require adjectives within them to }lave this index.</Paragraph>
      <Paragraph position="1"> l,'urthermore, 'brown1' cannot OCCUr as a loaf in a deel)er constituent in the VP t)ecause such an occurrence would be associated with a different index. In such cases 'brown' would modify a different noun with a different index: Ex. a { the\], dog\] , withl ,2 , the~ , lnvwn2 , collar2} A naive implementation of this deduction would attempt to expand the VP depth-ill'st, left to right, ill order to accommodate 'brown' in a complete derivation. Since this would not be possible, the NP 'the dog' would be discarded. This approach is grossly inefficient however. What is required is a more tractable algorithm which, given a wfss and its associated sign, will be able to determine whether all remaining lexical elements can ever form part of a complete sentence which includes that wfss.</Paragraph>
      <Paragraph position="2"> Note that deciding whether a lexical sign can appear outside a phrase is determined purely by the grammar, and not by whether the lexical elements share the same index or not. Thus, a more complex grammar would allow 'the man' from the bag Ex. 4 {thel,manl,shaves&lt;l,\],himselfl} even though 'himself' has the same index as 'the IIlan'.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="103" type="sub_section">
      <SectionTitle>
3.2 Outer Domains
</SectionTitle>
      <Paragraph position="0"> The approach introduced here compiles the relevant information of\[line fi'om the grammar and uses it to check for connectivity during bag generation. The compilation process results in a set of (Sign,Lex,Bindings) triples called outer domains.</Paragraph>
      <Paragraph position="1"> 'l'his set is based on a unification-based phrase structure grammar defined as follows: Definition 2 d grammar is a tuple (N, 7;P,S), where P is a sct of productions ce ~ /3, a is a sign, /3 is a list of signs, N is the set of all ee, T is the set of all signs appearing as elements of \[3 which unify with lexical entries, and S is the start sign.</Paragraph>
      <Paragraph position="2"> Outer domains are defined as follow: Definition 3 {(Sign, Lcx, Binds) I Sign C N tO T, Lcx ~ T and there exists a derivation Oe ~ /31Signt /32 LeJ /33 or a ~ f11Lez\] /32,S'iqnl /33, and Sign' a unifier for Sign, Lez j a unifier for Lcx, and Binds the set of all path pairs &lt;SignPath, LexPalh&gt; such thai Sign':SignPath is Ioken identical with LezS :LexPath} Intuitively, the outer domains indicate that preterminal category Lex ('an appear in a complete sentence with subconstituent Sign, such that l,cx is not a leaf of Sign. Using ideas from data flow analysis (Kennedy, 1981), predictive parser constructions (Aho et al., 1986) and feature grammar compilation (Trujillo, 1994) it is possible to construct such a set of triples. Outer domains thus represent elements whi(:h may lie outside a subtree of category Sign in a complete sentential  they would be indicated through paths such as</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML