File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-1022_metho.xml
Size: 12,519 bytes
Last Modified: 2025-10-06 14:08:04
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1022"> <Title>Bootstrapping Lexical Choice via Multiple-Sequence Alignment</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Multiple-sequence alignment </SectionTitle> <Paragraph position="0"> This section describes general multiple-sequence alignment; we discuss its application to learning mapping dictionaries in the next section.</Paragraph> <Paragraph position="1"> A multiple-sequence alignment algorithm takes as input n stringsandoutputsan n-rowcorrespondence table, or multiple-sequence alignment (MSA). (We explain how the correspondences are actually computed below.) The MSA's rows correspond to sequences, and each column indicates which elements of which strings are considered to correspond at that point; non-correspondences, or \gaps&quot;, are represented by underscores ( ). See Figure 3(i).</Paragraph> <Paragraph position="2"> From an MSA, we can compute a lattice . Each lattice node, except for \start&quot; and \end&quot;, corresponds to an MSA column. The edges are induced by traversing each of the MSA's rows from left to right. See Figure 3(ii).</Paragraph> <Paragraph position="3"> Alignment computation The sum-of-pairs dynamicprogrammingalgorithmandpairwiseiterative null alignment algorithm sketched here are described in full in Gusfleld (1997) and Durbin et al. (1998).</Paragraph> <Paragraph position="4"> Let SS be the set of elements making up the sequences to be aligned, and let sim(x; y), x and y 2 SS[f g, be a domain-speciflc similarity function that assigns a score to every possible pair of alignment elements, including gaps. Intuitively, we prefer MSAs in which many high-similarity elements are aligned.</Paragraph> <Paragraph position="5"> In principle, we can use dynamic programming over alignments of sequence preflxes to compute the highest-scoring MSA, where the sum-of-pairs score for an MSA is computed by summing sim(x; y) over each pair of entries in each column. Unfortunately, thesecomputationsareexponentialin n, thenumber ofsequences. (Infact,flndingtheoptimalMSAwhen n is a variable is NP-complete (Wang and Jiang, 1994).) Therefore, we use iterative pairwise alignment, a commonly-used polynomial-time approximation procedure, instead. This algorithm greedily merges pairs of MSAs of (increasingly larger) subsets ofthe n sequences;whichpairtomergeisdetermined by the average score of all pairwise alignments of sequences from the two MSAs.</Paragraph> <Paragraph position="6"> Aligning lattices We can apply the above sequence alignment algorithm to lattices as well as sequences, as is indeed required by pairwise iterative alignment. We simply treat each lattice as a sequence whose ith symbol corresponds to the set of nodes at distance i from the start node. We modify the similarity function accordingly: any two new symbols are equivalent to subsets S1 and S2 of SS, so we deflne the similarity of these two symbols as max(x;y)2S1PSS2 sim(x; y).</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Dictionary Induction </SectionTitle> <Paragraph position="0"> Our goal is to produce a semantics-to-words mapping dictionary by comparing semantic sequences to MSAs of multiple verbalizations. We assume onlythatthesemanticrepresentationusespredicateargument structure, so the elementary semantic units are either terms (e.g., 0), or predicates taking arguments (e.g., show-from(prem1; prem2; goal), whose arguments are two premisesand a goal). Note that both types of units can be verbalized by multi-word sequences.</Paragraph> <Paragraph position="1"> Now, semantic units can occur several times in the corpus. In the case of predicates, we would like to combine information about a given predicate from all its appearances, because doing so would yield more data for us to learn how to express it. On the other hand, correlating verbalizations across instances instantiated with difierent argument values (e.g., show-from(a=0,b=0,a*b=0) vs. show-from(c>0,d>0,c/d>0)) makes alignment harder, since there are fewer obvious matches (e.g., \a/b=0&quot; does not greatly resemble \c/d>0&quot;); this seems to discourage aligning cross-instance verbalizations. null Weresolvethisapparentparadoxbyanovelthreephase approach: + In the per-instance alignment phase (Section 3.1), we handle each separate instance of a semantic predicate individually. First, we compute a separate MSA for each instance's verbalizations. Then, we abstract away from the particular argument values of each instance by replacing lattice portions corresponding to argument values with argument slots, thereby creating a slotted lattice.</Paragraph> <Paragraph position="2"> + In the cross-instance alignment phase (Section 3.2), for each predicate we align together all the slotted lattices from all of its instances.</Paragraph> <Paragraph position="3"> + In the template induction phase (Section 3.3), we convert the aligned slotted lattices into templates |sequences of words and argument positions |by tracing slotted lattice paths.</Paragraph> <Paragraph position="4"> Finally, weenterthetemplatesintothemappingdictionary. null</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Per-instance alignment </SectionTitle> <Paragraph position="0"> As mentioned above, the flrst job of the per-instance alignmentphaseistoseparatelycomputeforeachinstanceofasemanticunitanMSAofallitsverbaliza- null tions. To do so, we need to supply a scoring function capturing the similarity in meaning between words.</Paragraph> <Paragraph position="1"> Since such similarity can be domain-dependent, we use the data to induce |again via sequence alignment |a paraphrase thesaurus T that lists linguistic items with similar meanings. (This process is described later in section 3.1.1.) We then set where SS is the vocabulary and x ... y denotes that T lists x and y asparaphrases.2 Figure2showsthelattice computed for the verbalizations of the instance 2These values were hand-tuned on a held-out development corpus, described later. Because we use progressive alignment, the case x = y = does not occur.</Paragraph> <Paragraph position="2"> show-from(a=0,b=0,a/b=0) listed in Figure 1. The structure of the lattice reveals why we informally refer to lattices as \sausage graphs&quot;.</Paragraph> <Paragraph position="3"> Next, we transform the lattices into slotted lattices. We use a simple matching process that flnds, for each argument value in the semantic expression, a sequence of lattice nodes such that each node contains a word identical to or a paraphrase of (according to the paraphrase thesaurus) a symbol in the argument value (these nodes are shaded in Figure 2). The sequences so identifled are replaced with a \slot&quot; marked with the argument variable (see Figure 4).3 Notice that by replacing the argument values with variable labels, we make the commonalities between slotted lattices for difierent instances more in Figure 2, for show-from(prem1; prem2; goal).</Paragraph> <Paragraph position="4"> Recall that the paraphrase thesaurus plays a role both in aligning verbalizations and in matching lattice nodes to semantic argument values. The main idea behind our paraphrase thesaurus induction method, motivated by Barzilay and McKeown (2001), is that paths through lattice \sausages&quot; often correspond to alternate verbalizations of the same concept, since the sausage endpoints are contexts common to all the sausage-interior paths. Hence, to extract paraphrases, we flrst compute all pairwise alignments of parallel verbalizations, discarding those of score less than four in order to eliminate spurious matches.4 Parallel sausage-interior paths that appear in several alignments are recorded as paraphrases. Then, weiterate,realigningeachpairofsentences,butwith previously-recognized paraphrases treated as identical, until no new paraphrases are discovered. While the majority of the derived paraphrases are single 3This may further change the topology by forcing other nodes to be removed as well. For example, the slotted lattice in Figure 4 doesn't contain the node sequence \their product&quot;.</Paragraph> <Paragraph position="5"> 4Pairwise alignments yield fewer candidate alignments from which to select paraphrases, allowing simple scoring functions to produce decent results.</Paragraph> <Paragraph position="6"> words, the algorithm also produces several multi-word paraphrases, such as \are equal to&quot; for \=&quot;. To simplify subsequent comparisons, these phrases (e.g., \are equal to&quot;) are treated as single tokens. Here are four paraphrase pairs we extracted from the mathematical-proof domain: (conclusion, result) (0, zero) (applying, by) (expanding, unfolding) (See Section 4.2 for a formal evaluation of the paraphrases.) We treat thesaurus entries as degenerate slotted lattices containing no slots; hence, terms and predicates are represented in the same way.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Cross-instance alignment </SectionTitle> <Paragraph position="0"> a single instance yield good information as to how to realize a predicate. (For example, \Assume [prem1] and [prem2], prove [goal]&quot;, where the brackets enclose arguments marked with their type.) Sometimes, though, the situation is more complicated.</Paragraph> <Paragraph position="1"> Figure 5 shows two slotted lattices for difierent instances of rewrite(lemma; goal) (meaning, rewrite goal by applying lemma); the flrst slotted lattice is problematic because it contains context-dependent information(seecaption). Hence,weengageincrossinstance alignment to merge information about the predicate. That is, we align the slotted lattices for all instances of the predicate (see Figure 6); the resultant unifled slotted lattice reveals linguistic expressions common to verbalizations of difierent instances. Notice that the argument-matching process intheper-instancealignmentphasehelpsmakethese commonalities more evident by abstracting over different values of the same argument (e.g., lemma100 and lemma104 are both relabeled \lemma&quot;).</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Template induction </SectionTitle> <Paragraph position="0"> Finally, it remains to create the mapping dictionary from unifled slotted lattices. While several strategies are possible, we chose a simple consensus sequence method. Deflne the node weight of a given slotted lattice node as the number of verbalization pathspassingthroughit(downweightedifitcontains punctuation or the words \the&quot;, \a&quot;, \to&quot;, \and&quot;, or \of&quot;). The path weight of a slotted lattice path is a length-normalized sum of the weights of its nodes.5 Weproduceasatemplatethewordsfromtheconsensus sequence, deflned as the maximum-weight path, which is easily computed via dynamic programming.</Paragraph> <Paragraph position="1"> For example, the template we derive from Figure 6's slotted lattice is We use lemma [lemma] to get [goal].</Paragraph> <Paragraph position="2"> 5Shorter paths are preferred, but we discard sequences shorter than six words as potentially spurious.</Paragraph> <Paragraph position="3"> Then we can use lemma lemma an = !a!n and get goal start end Now the fact about division to the goal we can useapply lemma lemma to get goal start end then the left-hand side</Paragraph> <Paragraph position="5"> each instance had two verbalizations. In instance (I), both verbalizations contain the context-dependent information \an = !a!n&quot; (the statement of lemma100); also, argument-matching failed on the context-dependent phrase \the fact about division&quot;.</Paragraph> <Paragraph position="6"> Now the fact about division an = !a!n and the goal start we can useapply lemma lemma to get goal end Then the left-hand side consensus sequence is shown in bold (recall that node weight roughly corresponds to in-degree). While this method is quite e-cient, it does not fully exploit the expressive power of the lattice, whichmayencapsulateseveralvalidrealizations. We leave to future work experimenting with alternative template-induction techniques; see Section 5.</Paragraph> </Section> </Section> class="xml-element"></Paper>