File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-1022_concl.xml

Size: 3,517 bytes

Last Modified: 2025-10-06 13:53:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1022">
  <Title>Bootstrapping Lexical Choice via Multiple-Sequence Alignment</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Related Work
</SectionTitle>
    <Paragraph position="0"> Nuprl creates proofs at a higher level of abstraction than other provers do, so we were able to learn verbalizations directly from the Nuprl proofs themselves. In other natural-language proof generation systems (Huang and Fiedler, 1997; Siekmann et al., 1999) and other generation applications, the semantic expressions to be realized are the product of the system's content planning component, not the proof ordata. Butourtechniquescanstillbeincorporated intosuchsystems,becausewecanmapverbalizations to the content planner's output. Hence, we believe our approach generalizes to other settings.</Paragraph>
    <Paragraph position="1"> Previous research on statistical generation has addressed difierent problems. Some systems learn from verbalizations annotated with semantic concepts (Ratnaparkhi, 2000; Oh and Rudnicky, 2000); in contrast, we use un-annotated corpora. Other work focuses on surface realization  |choosing among difierent lexical and syntactic options supplied by the lexical chooser and sentence planner  |rather than on creating the mapping dictionary; although such work also uses lattices as input to the stochastic realizer, the lattices themselves are constructed by traditional knowledge-based means (Langkilde and Knight, 1998; Bangalore and Rambow, 2000). An exciting direction for future research is to apply these statistical surface realization methods to the lattices our method produces.</Paragraph>
    <Paragraph position="2"> Word lattices are commonly used in speech recognition to represent difierent transcription hypotheses. Mangu et al. (2000)compresstheselatticesinto confusion networks withstructurereminiscentofour \sausage graphs&amp;quot;, utilizing alignment criteria based on word identity and external information such as phonetic similarity.</Paragraph>
    <Paragraph position="3"> Using alignment for grammar and lexicon induction has been an active area of research, both in monolingual settings (van Zaanen, 2000) and in machine translation (MT) (Brown et al., 1993; Melamed, 2000; Och and Ney, 2000)  |interestingly, statistical MT techniques have been used to derive lexico-semantic mappings in the \reverse&amp;quot; direction of language understanding rather than generation (Papineni et al., 1997; Macherey et al., 2001). In a preliminary study, applying IBM-style alignment models in a black-box manner (i.e., without modiflcation) to our setting did not yield promising results (Chong, 2002). On the other hand, MT systems can often model crossing alignment situations; these are rare in our data, but we hope to account for them in future work.</Paragraph>
    <Paragraph position="4"> While recent proposals for evaluation of MT systems have involved multi-parallel corpora (Thompson and Brew, 1996; Papineni et al., 2002), statistical MT algorithms typically only use one-parallel data. Simard's (1999) trilingual (rather than multiparallel)corpusmethod,whichalsocomputesMSAs, null is a notable exception, but he reports mixed experimental results. In contrast, we have shown that through application of a novel composition of alignmentsteps, wecanleveragemulti-parallel corporato create high-quality mapping dictionaries supporting efiective text generation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML