File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1019_metho.xml
Size: 12,139 bytes
Last Modified: 2025-10-06 14:14:07
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1019"> <Title>Connectivity in Bag Generation</Title> <Section position="3" start_page="103" end_page="103" type="metho"> <SectionTitle> SYNSEM :LOCAL:CONTI,INT:INI) EX. </SectionTitle> <Paragraph position="0"> '\[b ensure that only connected lexical signs are generated and analysed, the following assumt)tion must also be made: Assumption 3 A grammar will only generate or analyse connected lexical signs.</Paragraph> </Section> <Section position="4" start_page="103" end_page="103" type="metho"> <SectionTitle> 2 Bag Generation Algorithms </SectionTitle> <Paragraph position="0"> Two main tyl)es of rule-based bag generators have been proposed. The first type consists of a parser suitably relaxed to take into account the unordered character of tile input (Whitelock, 1994; Popowich, 1995; Trujillo, 1995). For example, in generators based on a chart 1)arser, the hm(tanmntal rule is applie(1 only when the edges to be ('ombined share no \]exical leaves, in contrast to requiring that the two edges have source and target nodes in common. The other type of generator applies a greedy algorithm to an initial solution in order Co find a grammatical sentence (1)oznafiski et al., 1.995).</Paragraph> <Section position="1" start_page="103" end_page="103" type="sub_section"> <SectionTitle> 2.1 Redundancy in Bag Generation </SectionTitle> <Paragraph position="0"> One disadvantage with the above generators is that they construct a nnnd)er of strnctures which need Dot have been computed at all. In buihl-ing these structures, the generator is e\[fcctively searching branches of the search space which never lead to a COml)lete sentence. Consider the the tbllowing input bag: { dog, barked, the, brown, big} Previous rest,archers (Ih:ew, 1992; l)hillil)s, 1993) have noted that from such a lx~g, tile following strings ;u:e generated but none can fi)rtn part of a (;omplete sentence (note that indices are omitted when there is no possibility of conrnsion; # indicates that the subs|ring will never be part of ~ complete sentence): Ex. 1 # the dog the dog barked # the brown dog For simph'~ cases in chart based generators such unnecessary strings do not create many problems, but for k)nger sentences, each additional su bstring implies a further branclt in the search tree to be considered.</Paragraph> <Paragraph position="1"> Since tile (;Oml)Utational ('Oml)lexity of the greedy bag generator (Poznafiski (% al., 1995) is polynolni&l (i.e. O(?,.d)), the cf\]'ect of ,'(~(hlnda,lt sul)structnres is not as detrimentM as for parser based generators. Neverthelc'ss, a (:ert~in am(rant of mmccesm~ry work is t)erformed. 'lk) show this, consider the test-rewrite sequence for l!'~xaml)h'. I: Test: (log barked the brown big R.ewrite: __ barked the dog brown big Test: barked (the dog) brown big Rewrite: __ (the dog) barked brown big Test: ((the (log) barked) brown big Rewrite: the brown dog barked __ big Test: ((the (brown (log)) harked)big Rewrite: tile big (brown dog) barked _.</Paragraph> <Paragraph position="2"> Test: ((the (big (brown clog))) barked) ('terminate) null In this scqnence donble und(,rscorc (__.) indicates the starting position of a moved constituent; the moved constituent itself is given in bold t~ce; the bracketing indicates analysed constituents (for expository purposes the algorithm has been oversimplified, but the general idea remains the salne). Now consider the step where 'brown' is inserte(l 1)etwe(;n '|tie' and 'dog'. This action causes the complete structure for 'the dog barked' to be discarded and replaced with that for %he brown (tog barked', which in turn is discarded and replaced by 'the big brown dog barked'.</Paragraph> </Section> <Section position="2" start_page="103" end_page="103" type="sub_section"> <SectionTitle> 2.2 Previous Work </SectionTitle> <Paragraph position="0"> A number of prnning techniqtms have I)een suggested to re(hwe the mnom,t of redundancy in bag generators. Brew (19921 proposed a constraint propagation technique which eliminates branches during I)ag generation by considering the necessary lh,~ctor-argument relationships that exist between the component basic signs of categorial signs. These relationships form a graph indic:Lting the necessary conditions for a lexical item to form part of a comt/h'.te sentence. Such graphs can 1)e use(l to elinlinate 1;he substrings in l'3xaml)le 1. Unh)rtunately the technique exploits specilic asl)ects of categorial grammars an(l it is not <:lear how the.y may he used with other formalisms.</Paragraph> <Paragraph position="1"> Trujillo (1995) adapts some of Brew's ideas 1,o phrase structure grammars by ('emil|ling l!'of low functions and constructing adjacency graphs.</Paragraph> <Paragraph position="2"> While this al)l)roach reduces the size of the search sl)ace , it; does not prune it; sulllciently for cert,|in classes of rood|tiers.</Paragraph> <Paragraph position="3"> Phillips (199'.{) proposes handling ine\[ticiency ~1; the expense of completeness. Ills idea is to mainl.a.il~ a queue, of rood|liable constituents (e.g. N Is) in order to delay their combination with other constituents until rood|tiers (e.g. Pl's) have been ana.lysed. While practical, this approach can lead to alternative wdid sentences not being gen(;r;(.t(~(I.</Paragraph> </Section> </Section> <Section position="5" start_page="103" end_page="103" type="metho"> <SectionTitle> 3 Connectivity Restrictions </SectionTitle> <Paragraph position="0"> In scm('hing ILr a~ nmchanisIn that el i.li~,al.es un tteCCssitry WISS, it will I)e l)ossible to use indices in lexical signs. As lnenl;ione(\[ earlier, these indices derivation. The following definition specifies how outer domains are used: Definition4 A lexical sign Lea/ is in the outer domain of Sign' if\[ there is a triple (Sign,Lex, Binds) in outer domains such that Sign and Lex unify with Sign I and Lez j respectively, and there is at least one pair <PathS, PathL> E Binds such that Sign':PathS unifies with LezQPathL.</Paragraph> <Paragraph position="1"> In compiling outer domains, inner domains are used to facilitate computation. Inner domains are defined as follows: Definition 5 {(Sign, Lex, Binds) I Sign C N U T, Lex 6 7' and there exists a derivation (~ :~ /31LezS f12, with Sign I a unifier for Sign, Le~ s a unifier for Lex, and Binds the set of all path pairs <SignPath, LexPath> such that Sign':SignPath is token identical with LezS :LexPath} The inner domains thus express all the possible terminal categories which may be derived from each nonterminal in the grammar.</Paragraph> <Paragraph position="2"> To be able to exploit connectivity during generation, inner and outer domains contain only triples in which Binds has at least one element.</Paragraph> <Paragraph position="3"> In this way, only those lexical categories which are directly connected to the sign are taken into account; the implication of this will become clearer later.</Paragraph> <Paragraph position="4"> As an example, the outer domain of NP as derived from the above grammar is:</Paragraph> <Paragraph position="6"> This set indicates that for any NIP, the only terminal categories not contained in the subtree with root NP, and with which the NP shares a semantic index, are Vtra and P. For instance, the first triple arises from the following tree: The pruning technique developed here operates on grammars whose analyses result in connected leaves.</Paragraph> <Paragraph position="7"> Consider SOllle wfss W constructed from a bag B and with category C; this category, in the form of a sign, will include syntactic and lexical-semantic information. Such a wfss will have been constructed during the bag generation process. Now, either W includes all the input elements as leaves, in which case W constitutes a complete sentence, or there are elements in the input bag which are not part of W. In the latter case, for bags obeying Assmnption 2, the following condition holds for any W that can form part of a complete sentence: Condition 1 Let L be the set of leaves appearing in W, let a be the .graph (V, Fd, where V : {C3 U B- L, and E-- { {x,y} \] x,y 6 Vand y is in the outer domain of x}. Then G is connected.</Paragraph> <Paragraph position="8"> 'lb show that; this condition indeed holds, consider a grammatical ordering of some input bag B, represented as the string W: ce.. T&.w By Assumption 2, the lexical elements in the bag, and therefore in any grammaticM ordering of it, are connected. Now consider reducing this string using the production rule: D~75 to give the string W': O~,. D..o2 In this case, the signs in W' will also be connected. This can be shown by contradiction: Proof 1 Assume that there is some sign ~ in W' to which D is not connected. Then grammar G would allow disconnected strings to be generated, contrary to Assumption 3. 7'his is because D would not be able to rewrite 7161 in such a way that both daughters were connected to ~, leading to a disconnected string.</Paragraph> <Paragraph position="9"> The situation in string W' is analogous to that in Condition 1. By identifying signs which are directly connected in E, it is possible to determine whether g is connected and consequently whether C can form part of a complete derivation, instead of simply comparing the value of index paths, it is more restrictive to use outer domains since they give us precisely those elements which are directly connected to a sign and are in its outer domain.</Paragraph> <Section position="1" start_page="103" end_page="103" type="sub_section"> <SectionTitle> 3.4 Example </SectionTitle> <Paragraph position="0"> Consider P~xample 2. 'Ib eliminate the wfss 'the dog' from further consideration, a connected graph of lexical signs is constructed before generation is started (Figure 2). This graph is built by nsing the outer domain of each lexical element to decide which of the remaining elements could possibly share an index with it in a complete sentence. null When a new wfss is constructed during genera|ion, say by application of the modified fimdame.ntal rule or during the rewrite phase in a greedy algorithm, this initial graph is updated and tested for connectivity. If the updated graph is not conneeted then the proposed wfss cannot form part of a complete sentence. Updating the graph involves three steps, l&quot;irstly every node in the graph which is a leaf' of tit(' new wfss is deleted, toge.t.lmr with its associated ares. Secondly, a new node corresponding to tit(: new wNs is added to the graph. Finally, a new arc is added to the graph between the uew node and every other node lying in its outer domain. The updated (disconnected) graph that ensnes after constructing 'the clog' is shown in Figure 3; this NP is therefore rejected.</Paragraph> <Paragraph position="1"> wfss 'the dog' is constructed.</Paragraph> </Section> </Section> <Section position="6" start_page="103" end_page="103" type="metho"> <SectionTitle> 4 Compiling Connectivity </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="103" end_page="103" type="sub_section"> <SectionTitle> Domains </SectionTitle> <Paragraph position="0"> For reasons of space, the computation of outer domains cannot be described fully here. The broad outline, however, is as follows. First, the inner domains of the grammar are calculated. This involves the calculation of the fixed point of set equations, analogous to those used in the construction of First sets for predictive parsers (Aho et al., 1986; Trujillo, 1994). Given the inner domains of each category in the grammar, the construction of the outer domains involves the computation of the lixed point of set equations relating the outer domain of a category to the inner domain of its sisters and to the outer domain of its mother, in a manner analogous to the eoinputation of Follow sets.</Paragraph> <Paragraph position="1"> I)uring computation, the set of Binds is monotonically increased as difDreut ways of directly connecting sign and lexeme arc found.</Paragraph> </Section> </Section> class="xml-element"></Paper>