File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-4014_intro.xml
Size: 8,392 bytes
Last Modified: 2025-10-06 14:03:46
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-4014"> <Title>Re-Usable Tools for Precision Machine Translation[?]</Title> <Section position="4" start_page="0" end_page="54" type="intro"> <SectionTitle> 2 System Design </SectionTitle> <Paragraph position="0"> The backbone of the LOGON prototype implements a relatively conventional architecture, orga[?]This demonstration re ects the work of a large group of people whose contributions we gratefully acknowledge.</Paragraph> <Paragraph position="1"> Please see 'http://www.emmtee.net' for background.</Paragraph> <Paragraph position="3"> 'Bodl is densely populated.' The core of the structure is a bag of elementary predications (EPs), using distinguished handles ('hi' variables) and '=q' (equal modulo quanti er insertion) constraints to underspecify scopal relations. Event- and instance-type variables ('ej' and 'xk', respectively) capture semantic linking among EPs, where we assume a small inventory of thematically bleached role labels (ARG0 ... ARGn).</Paragraph> <Paragraph position="4"> These are abbreviated through order-coding in the example above (see SS 2 below for details).</Paragraph> <Paragraph position="5"> nized around in-depth grammatical analysis in the source language (SL), semantic transfer of logical-form meaning representations from the source into the target language (TL), and full, grammar-based TL tactical generation.</Paragraph> <Paragraph position="6"> Minimal Recursion Semantics The three core phases communicate in a uniform semantic interface language, Minimal Recursion Semantics (MRS; Copestake, Flickinger, Sag, & Pollard, 1999). Broadly speaking, MRS is a at, event-based (neo-Davidsonian) framework for computational semantics. The abstraction from SL and TL surface properties enforced in our semantic transfer approach facilitates a novel combination of diverse grammatical frameworks, viz. LFG for Norwegian analysis and HPSG for English generation.</Paragraph> <Paragraph position="7"> While an in-depth introduction to MRS (for MT) is beyond the scope of this project note, Figure 1 presents a simpli ed example semantics.</Paragraph> <Paragraph position="8"> Norwegian Analysis Syntactic analysis of Norwegian is based on an existing LFG resource grammar, NorGram (Dyvik, 1999), under development on the Xerox Linguistic Environment (XLE) since around 1999. For use in LOGON, the grammar has been modi ed and extended, and it has been augmented with a module of Minimal Recursion Semantics representations which are computed from LFG f-structures by co-description.</Paragraph> <Paragraph position="9"> In Norwegian, compounding is a productive morphological process, thus presenting the analysis engine with a steady supply of 'new' words, e.g. something like klokkeslettuttrykk meaning ap- null cessing components are managed by a central controller that passes intermediate results (MRSs) through the translation pipeline. The Parallel Virtual Machine (PVM) layer facilitates distribution, parallelization, failure detection, and roll-over. proximately time-of-day expression. The project uses its own morphological analyzer, compiled off a comprehesive computational lexicon of Norwegian, prior to syntactic analysis. One important feature of this processor is that it decomposes compounds in such a way that they can be compositionally translated downstream.</Paragraph> <Paragraph position="10"> Current analysis coverage (including well-formed MRSs) on the LOGON corpus (see below) is approaching 80 per cent (of which 25 per cent are 'fragmented', i.e. approximative analyses).</Paragraph> <Paragraph position="11"> Semantic Transfer Unlike in parsing and generation, there is less established common wisdom in terms of (semantic) transfer formalisms and algorithms. LOGON follows many of the main Verbmobil ideas transfer as a resource-sensitive rewrite process, where rules replace MRS fragments (SL to TL) in a step-wise manner (Wahlster, 2000) but adds two innovative elements to the transfer component, viz. (i) the use of typing for hierarchical organization of transfer rules and (ii) a chart-like treatment of transfer-level ambiguity.</Paragraph> <Paragraph position="12"> The general form of MRS transfer rules (MTRs) is as a quadruple:</Paragraph> <Paragraph position="14"> where each of the four components, in turn, is a partial MRS, i.e. triplet of a top handle, bag of EPs, and handle constraints. Left-hand side components are uni ed against an input MRS M and, when successful, trigger the rule application; elements of M matched by INPUT are replaced with the OUTPUT component, respecting all variable bindings established during uni cation. The optional CONTEXT and FILTER components serve to condition rule application (on the presence or absence of speci c aspects ofM), establish bindings for OUTPUT processing, but do not consume elements of M. Although our current focus is on lation to input 'complexity'. The columns are, from left to right, the corpus sub-division by input length, total number of items, and average string length, ambiguity rate, grammatical coverage, and generation time, respectively. translation into English, MTRs in principle state translational correspondence relations and, modulo context conditioning, can be reversed.</Paragraph> <Paragraph position="15"> Transfer rules use a multiple-inheritance hierarchy with strong typing and appropriate feature constraints both for elements of MRSs and MTRs themselves. In close analogy to constraint-based grammar, typing facilitates generalizations over transfer regularities hierarchies of predicates or common MTR con gurations, for example and aids development and debugging.</Paragraph> <Paragraph position="16"> An important tool in the constructions of the transfer rules are the semantic interfaces (called SEM-Is, see below) of the respective grammars.</Paragraph> <Paragraph position="17"> While we believe that hand-crafted lexical transfer is a necessary component in precision-oriented MT, it is also a bottleneck for the development of the LOGON system, with its pre-existing source and target language grammars. We have therefore experimented with the acquistion of transfer rules by analogy from a bi-lingual dictionary, building on hand-built transfer rules as a seed set of templates (Nordg ard, Nygaard, Llnning, & Oepen, 2006).</Paragraph> <Section position="1" start_page="53" end_page="54" type="sub_section"> <SectionTitle> English Generation Realization of post-transfer </SectionTitle> <Paragraph position="0"> MRSs in LOGON builds on the pre-existing LinGO English Resource Grammar (ERG; Flickinger, 2000) and LKB generator (Carroll, Copestake, Flickinger, & Poznanski, 1999). The ERG already produced MRS outputs with good coverage in several domains. In LOGON, it has been re ned, adopted to the new domain, and semantic representations revised in light of cross-linguistic experiences from MT. Furthermore, chart generation ef ciency and integration with stochastic realization have been substantially improved (Carroll & Oepen, 2005). Table 1 summarizes (exhaustive) generator performance on a segment of the LOGON sitions give rise to distinct, but potentially related, semantic predicates. Likewise, the SEM-I incorporates some ontological information, e.g. a classi cation of temporal entities, though crucially only to the extent that is actually grammaticized in the language proper.</Paragraph> <Paragraph position="1"> development corpus: realizations average at a little less than twelve words in length. After addition of domain-speci c vocabulary and a small amount of ne-tuning, the ERG provides adequate analyses for close to ninety per cent of the LOGON reference translations. For about half the test cases, all outputs can be generated in less than one cpu second.</Paragraph> <Paragraph position="2"> End-to-End Coverage The current LOGON system will only produce output(s) when all three processing phases succeed. For the LOGON target corpus (see below), this is presently the case in 35 per cent of cases. Averaging over actual outputs only, the system achieves a (respectable) BLEU score of 0.61; averaging over the entire corpus, i.e.</Paragraph> <Paragraph position="3"> counting inputs with processing errors as a zero contribution, the BLEU score drops to 0.21.</Paragraph> </Section> </Section> class="xml-element"></Paper>