File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/94/c94-1019_abstr.xml

Size: 10,366 bytes

Last Modified: 2025-10-06 13:47:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1019">
  <Title>Two Types ot' Adaptive MT Environments</Title>
  <Section position="2" start_page="125" end_page="126" type="abstr">
    <SectionTitle>
2. The Best Output Segment Apprtmch to
</SectionTitle>
    <Paragraph position="0"> Adaptlvity ()ur B()S Itpproach experiment was cmried out lor a Spanish - English Iranslation sel-u F, in the fran/ework of Ihe Pangloss MT project (F'angloss, 1994)and used Ihrce. MT engines -- KBMT, EBMT, and TBMT.</Paragraph>
    <Paragraph position="1"> Tile KBMT engine we used was tile. mainline engine  of tile Pangh./ss system, a lradilional KBMT environment described in some detail in (Pangloss, 1994). It was important for tile BOS experiment that this engine genernlcd an internal quality rating for each OUtlmt segment it produced. null The tmsic idea of EBMT ix simple (cf. Nagao, 19g4): an input passage S is comlmred with the sourcc-I:mguage &amp;quot;side&amp;quot; of a bilingual lexl archive, where lexl passages are slored witll their Iransllitions ill|() It flu'gel langtmge (or a set of such). TIle &amp;quot;closesl&amp;quot; matcll, passage S' is selecled and lhe lranslation of lhis clos'ast malch, the passage 7&amp;quot; is accepled as tile IranslaliOn of S. Our EBMT engine iiscd a 100MB bilingual Spmfish - English archive of UN cdticial documents, hi preparation fc, r processir~g, die archive was idigned at tile sentence level. Tile lnalching of input passages with the Spanish side of the archive was allowed to be inexact. Penalties were assessed for omitted and ex-Ira words, word occurrences ill different rnorphological forms and differences in word order. The English siring lranslaling lhe best Spanish archive candidale wlts then lound in tile English sentence aligned with tile Spanish sentence in which lhe hest match candidme appea,',ad. A .Spanish - English MRD win; used in determining Iranshtlions of individual words inside the candidale segmenis. A special ronlinc then calculated lhe expected quality o1 the resulting Iranslatiorl, which helped .;it lhe restlll inlegralion stage of nnllti-engine MT syslenl operaliOn. ()ur EBMT approach ix described in Nirenln.~g el al., 1993 and Nirenburg ,at al., s.bmiHed).</Paragraph>
    <Paragraph position="2"> Our transfer systenl was very simple. It was Ilascd on direct Icxical substitution fo English words and phrases for Spanish words and phrase, forlilied wilh n/orphological analysis at/d synlhesis modifies. Tile process relied on ii tltllllber of dalabases - it Slmnish - English MRD, lhe lexicons used by ll/e KBMT engine, a large sel of user-generated bilingual glossaries as well as a gltzetleer and It list of proper and orglll/izaliOll names. The usergerler.:llcd glossaries for our experiment corllained aboul 174,000 entries. Glossary enlires conlained variahles to allow feature matching and indices U.i link the parts of phrasal elllries Ilia\[ translated Olle anolller. Fof illS\[ante, lhe following glossary enlry</Paragraph>
    <Paragraph position="4"> can help to generate such English sentences as I r~9\].ease yo~l froln your promise; He released me from my promise; You will be releasing her from</Paragraph>
    <Paragraph position="6"> In tile rule above dop stands for&amp;quot;dhecl object pronoun&amp;quot; and poss for &amp;quot;possessive.&amp;quot; &amp;quot;l, fl-,les of feIflure correspondences were prepared to make the translation possible.</Paragraph>
    <Paragraph position="7"> Note that in many CltSCS Spanish features and English fealurcs were quile differenl (rlot;ih, ly, for verbs). The rltn,nbers in &amp;quot;mgtthlr brackets are indices which show the mo,pht+logical synthesizer which word Io put in a particular form at generation time. In lifts expe,iment we used variables for the lollowing word classes: proper IlllllICS, such as imlividual, ct'mlplllly and ill:tee Illlllies; litltllllcrs itl/d the vltriotls classes of prollotlns -- persorml, possessive, rellcxive, direcl ohjccl, indirecl olljecl and possessive al~st)hile.</Paragraph>
    <Section position="1" start_page="125" end_page="126" type="sub_section">
      <SectionTitle>
2.1. Combining Results
</SectionTitle>
      <Paragraph position="0"> The crux of tile B()S melhod is combining, results from indivhlual engines. A clmrl data strllCltlrc wil~ Itge(\] tO COlllbinc resells l\]t)lii Ihc individual engines. Bclbre the lranshffion process, the edges of lhe chart were made to correspond to indivkhml words in the input. New edges are added It) the ch:lrl tilrougtl tile operation of the throe MT erlgines labeled wilh tim Iransration of a scgmenI of tile inpul siring and indexed hy this segment's heginning and end positions. The KBM'I\[ and EBMT engines also carried a quality score for each ioulpul elemenl.</Paragraph>
      <Paragraph position="1"> After all lima engines finished their work il is lleC(b; null sary lo lind the sequence of transhltion candidates vehich ~0 cover the input string as densely as possible (so Ihat there is a Iraiislalion for ak I\]laI/y source lexl elements as possil',le); b) use the &amp;quot;hesl&amp;quot; of lhe available canditlales. q~'~ lind the best candidates three heuristics were used a) intern'd quatily ratings produced by the KP, MT and EBMT engines; h) stalic relative qu;dity assessmcnl of the protitular engines wc used and c) the length of lhe translation segment (the longer, Ihe hetter). Enhancing lhe quality of lhese hemistics antl generally tinding more Sol)hislicated ways of combining timlings of individual engines is the most important direction of improvenlenl of ollr BOS system.</Paragraph>
      <Paragraph position="2"> The chart walk algorithm l)roducing the final result of lhe B()S system used lhe above heuristics. The algorilhm uses dynamic programming to lind the Ol~linmI cover (a cover with the best cnmtllative score), aSS\]lining correct  component qualily scores. 11 ix dcscrihed in some detail and illustralcd in Nirenhurg and l:rcdcrking, 1994 and Frederking an(I Nirenburg, sttl,tfftted.</Paragraph>
      <Paragraph position="3"> 3. The l)isl)atcher-Based AI)l)roach to</Paragraph>
    </Section>
    <Section position="2" start_page="126" end_page="126" type="sub_section">
      <SectionTitle>
Adaptivity
</SectionTitle>
      <Paragraph position="0"> In this apfuoach, ,'t dispalcher nlodule ix used to break up the input text into segments and assign each segmcnl Io one or another o1' tile avaihtble MT engines. Among Ihc possible diagnoslics l~'~r the dispatcher are: * q~ype of translation -- whether the rcstilt of lranshtlion is intended for disscmin:ttion or for assimihttitm; whether a complete lranslaticm is nccdetl or an abst,act or even a simple categorizalitm of a Iext (e.g., as a text Ihat is iml~ort;mt CllOIIgh Io be Iranshdcd in its entirety).</Paragraph>
      <Paragraph position="1"> . Availability of parallel lext iu a parlicuhu domain 'm(I on a Imrticular topic. This ix lhe crucial cnal'fling condition 15r EBMT and SP, MT.</Paragraph>
      <Paragraph position="2"> t, Amoltnl of ambiguity in Ihc source passage, hoth in tile source language itself and vis-a-vis a I;irget language. The smaller the tlegrce of anlhilmily, Ihe more attractive the KBMT approach.</Paragraph>
      <Paragraph position="3"> .t. Size and quality of available KBMT resolnces (ontology, lexicons, etc.).</Paragraph>
      <Paragraph position="4"> The work on the dislmtcher, thus, includes a) evalualing tile translation contcxL with rcspccl to tile fore&amp;quot; crilcria above and 1&amp;quot;0 pulling Iogelher a decision mechanism which will establish the relative ,:q:,propriatcncss of each of tile available engines lbr treating all input passage in a given context. All additionaI important parameter in Ihe operation of tile dispatcher is determirung the most approl)riate size of input passage to be dispatched It) an MT engine. Since tin entire input text c~.tn t'~C processed hy a combination of MT engines, it is necessary to maximize tilt: cxpcclcd quality of Otllllllt OVCf ;I vark.'Ay of possil'~le ways of&amp;quot;chunking&amp;quot; tile input Icxl for processing. This has some similarity with the chart walk in file B()S alqm)ach.</Paragraph>
      <Paragraph position="5"> The disimtchcr will unsc an additional set of diagnostics dctcrlllillcd by file slntlClllre of Ihe spccitic MT engine. Tim dcvelt)l'llllel/I ()\[lhe:.;c dispatcher heuristics - ill (,lhcr words, how the dispalchcr is to be h-ained (see below) ....</Paragraph>
      <Paragraph position="6"> is a key l)Oinl of tile \[l\[of~ose(I research. A prelinunary analysis elthese spccilic tliagnoslic heuristics, orderc(I by Ihc parlicuhu&amp;quot; cngil\]c, follows.</Paragraph>
      <Paragraph position="7"> An additional tliagnoslic heuristic lot SBMT inspects Ihc frcqucrlcy olc, ccurfencc of each iil(livi,.Itt;ll input slrillg ileal\] in the corl)tts. The greater the frequency of the items c()ntaincd ill the lcxt, tile glCaler the likelihood lhal tile SRMT engine will produce \[,,ood tlualily OUtlml.</Paragraph>
      <Paragraph position="8"> The ahovc heuristic will also serve tile EI',MT engine.</Paragraph>
      <Paragraph position="9"> A heuristic uscfut spccilically for EBMT is the ~llllOtll\]t of overlap (if ;ill ill\]Hit IeXt with a (lOclll/\]et\]l ahcady in llle source lan,t,,lmgc si(Ic of the bilingual archive.</Paragraph>
      <Paragraph position="10"> The diagnostics lot tile &amp;quot;\['BMT and KBMT al)inoaches moslIy check Ihc coverages of approprialc slalic knowlet\[~c SOUlCCS - - ~fltllllll;.trs ,:llltl lexicons.</Paragraph>
      <Paragraph position="11"> Tim diagnostics proposed above vary in cosl, both in tt.'l'lllg c,f developing the procedures and in tel'ill,'-; Of Ihcir colnplll;ltional conlplcxity. Rehllively inexpensive are tliat,xlt)stics h;Isctl oil recognizing il\]dividual terms or paltoms in the inptlt (e.g., chccki,'lg tile availahilily of ilcms ill a lexicon t\]r it corplls, chcckhlg tile lenglh of segtllCllt:.,, checking for local sct\]llellcing p'alterns of forms). Soil|ewhat Illore cxpcnsivcarc diagnostics based on a.',:signnlenl of catc,t,,orics to forlns. \[1 is screndipilous, howevcq thai tile more cosily tli'lgnosfics are generally related to milial stages of pnocenxin t, nccessgry ill tilt)st cnghles. This opc'.ns a pt~lcnlial I(u inlerleaving Ihe processing by individual engines with lhc operation of tile disp;llchcr.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML