File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-1011_intro.xml

Size: 12,911 bytes

Last Modified: 2025-10-06 14:05:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1011">
  <Title>Portable Knowledge Sources for Machine Translation</Title>
  <Section position="2" start_page="0" end_page="86" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> In the last decal% more and more commercia.l machine translation (MT) systems have lmcome available for a wide variety (if languag, e Iiairs. An MT system is a very handy tool: trot one quickly Iinds out thai, it Irlakes tt, e same errors over and over again even if a user dictionary is carefully maintained. There are sew, ral re;mons for such repeated errors.</Paragraph>
    <Paragraph position="1">  1. Commercial MT systems are not tmilt in actor  dance with a powerful h;xical semantic formalism. The user dictionary alone cannot (llsamlfiguate word senses and phrasal icttitelimei/ts satisNmtorily.  2. MT systems cannot handle the domain and context dei)endency of word sm,se, ph rasal atl, aeh men L an d word selection.</Paragraph>
    <Paragraph position="2"> 3. In a shared environment, each user has a differ-.  ent user dictionary, and must therefore redumhmtly correct the same errors ms all the other users. A powerful lexieal semantic apl)roaeh \[s\] couhl give more accurate translatiml~ but it might be. too Inuch to ask users to develop their dictionaries within that formalism. Tl, e simple structure of a user dieti(mary also restricts the learning ability of M'r systems during the post-editing process. The second of the almw~ re;kstms tl~ motivated recent exanlple-ba~ed and case-b~med machine translation re.search \[9, s, 10\]. However, a method for finding the best-matehlng eases hi a cime. base., where cases (or exalnples) are collected from different dmna.ins or contexts~ has not been studied well. Nor is it kllown whether considering the frequency of eases gives a better result. The third reason is rarely &lt;liscussed&gt; hut it is riot desirable sirnply to share a single user dictionary, since the dictionary may become inconsistent by reflecting multil&gt;le users' updates. McRoy \[s\] discussed word sense disambiguation using multlph; knowledge sources, but her method is still dictionary-b~med.</Paragraph>
    <Paragraph position="3"> Some of the eoolmerclal systems for human-aided trailslatlm h such as the Translation Manager/2 \[~1~ can provide the user with nmre Ilexible access to multilile dictionaries and the Iranslation memory (a repository of pairs of smlrce and target sentences). This organization of knowledge cmdd lie quite useful for selecting correct transhttlons of vvords~ lint the types of knowledge awdlable from the dictionaries and translation memo.</Paragraph>
    <Paragraph position="4"> ries are rather limited&gt; and are certainly not enough for resolving strueturaJ ambiguities in sentences.</Paragraph>
    <Paragraph position="5"> In this paper, we propose Porhtble Knowledge Sources (PKSs) for machine translatlou. A PKS consists of preference infi~rmatitm on word sense, l)hrasal attachment, and word selection for translation. It is acquired through user lift.erection in the post-editing process, and is stored with the document being translated. When translating a document by using an Mq' system, a user can specify a llst (if already-translated documents, and the system will ma.ke use of the PKSs included in the specilied documents. We show how Sltch a collectimt of I)KSs is organized, used for translation~ and integrated into a user dictionary, and how the problem stated aliow~ can he solved by using PKSs.</Paragraph>
    <Paragraph position="6">  2. Portable Knowledge Sources A Portabh~ Knowledge Source (PKS) consists of preference infornlation on three kinds of ambiguity: 1. Word sense 2. Phrasal attachelnent 3. Word selection  The preference inf'ormatiou is acquired from the ilser through post-.edlting or interactlw; translation \[% i\] iuld is paired with the docu lnelit that the user is working till. That is, a t'KS is stored and managed 1,\[)gether with the document flit which it is created.</Paragraph>
    <Paragraph position="7"> l,et PKI~ PK~&gt; and PK3 be PKSs for the respective types of ambiguity mentioned above. The following is ~n example of word sense ~mbig~fity: l)elet.e the line.</Paragraph>
    <Paragraph position="8">  The word &amp;quot;line&amp;quot; could be (1) a single row of letters, (2) a geometric mark, (3) a hardware wire, and so on, for each of which a different translation is usually required in a target language. When tile user specifies that a particular occurrence of tile word &amp;quot;line&amp;quot; in a document D means a single row, the PI(S (Pgl (&amp;quot;line&amp;quot; (cat n)) (sense I)) is created, and is stored with D.</Paragraph>
    <Paragraph position="9"> An example of phrasal attachment ambiguity is as follows: null Order the publication through the IBM branch serving your locality.</Paragraph>
    <Paragraph position="10"> The present participle phrase can be either attached to the main verb &amp;quot;order,&amp;quot; or to tile noun &amp;quot;branch.&amp;quot; If the user specifies that it modifies the noun as a post-nominal adjective phrase (ADJP), the PKS (PK2 (&amp;quot;serve&amp;quot; (cat v) (~orm prsprt)) ADJP (&amp;quot;branch&amp;quot; (cat n))) is created) The preference of prepositional phrase attachment is also represented by PK2.</Paragraph>
    <Paragraph position="11"> Finally, an example of word selection arnbiguity is (1) &amp;quot;~'1--. C/'y~&amp;quot;' and (2) &amp;quot;~EC/~:J'-&amp;quot; fl, r the compound noun &amp;quot;memory chip&amp;quot;, where tile first translatinn can often be found in PC documents, while the second one, which ha.a the same meaning, is typically used in textbooks. When the user specifies that the second one should be used, the PKS (PK3 (&amp;quot;memory chip&amp;quot; (cat n)) (&amp;quot;It~N~&amp;quot; (cat n))) is created. If word sense is to be included in the definition of word selection, such that the word W1 is used in sense S and should be translated by the word W2~ it is separately represented by (PK1 W1 S) and (PK3 W1 w2).</Paragraph>
    <Paragraph position="12"> Each PKS collected through user interactim~ has an age, based on the time and (late of its creation. The younger the PKS, the stronger its preferM)ility in one document, since it could have been used to overrule the preceding PKSs. Note that the age of a PI(S is valid only among other PKSs in the same set. Two sets of PKSs are not comparable if they are paired with differ- null ent documents.</Paragraph>
    <Paragraph position="13"> 3. Organizing Portable Knowledge</Paragraph>
    <Section position="1" start_page="85" end_page="85" type="sub_section">
      <SectionTitle>
Sources
</SectionTitle>
      <Paragraph position="0"> Once the user has translated several documents, tile sets of PKSs paired with them begin helping tile MT system to resolve the three kimls of ambiguity described (The representation of the PKPS can vary depending on the MT system that uses the PKS. For examl)le, the modifier and modillee phrases cau be represented by syntactic structures; word senses and semantic case relations can be associated; and so eli.</Paragraph>
    </Section>
    <Section position="2" start_page="85" end_page="86" type="sub_section">
      <SectionTitle>
Dictionary-based MT System
</SectionTitle>
      <Paragraph position="0"> ill i i;; ;iii!ili i i\]~i: \[i\]?i !:i i ii {iiii~ i Till ill if i i i: i\]ii ;i\]ii i i\]~ :~i~i\]i~ili ::!i Y l I i::!i:;- worO #O OCt on!!il)i; i::!) ~::!!~;~!ii!i~;i!i~iiii::~iis:iii~i~!i::ii!i;i~::i~:sii::i~i~i~::iii~!::~i!~!iii~ii~;!~ii!ii~!i~:i!~:i;~\]  in the previous section. When a new document is to be translated, the user either specifies a list of previously translated documents as a source of available PKSs, or lets the MT system automatlcMly choose them. Such a list of documents is cMled a document list.</Paragraph>
      <Paragraph position="1"> Figure 1 compares a PKS-based MT system with that of a conventional dietionaryd~ased MT system.</Paragraph>
      <Paragraph position="2"> l&amp;quot;,ven though a logical document (llay not be identified with a physical lile, it is the e:~siest and n|ost practical way to organize the hierarchy of documents. In practlce, when transl~d.lng technical documents, it is usual to translate the glossary first, agree tm the translations of technical terms, and then work on individuM chapters.</Paragraph>
      <Paragraph position="3"> This gives us a natural ordering of documents~ glossary -+ chapter 1 ---+ chapter 2 ...</Paragraph>
      <Paragraph position="4"> which is also used as an ordering of PKSs to be hlcof porated for machine translation. One way of automatically choosing tile document X for translating a new document Y is to cMculate tit(: overlap of words contained in both X and Y, and to find the X with the largest overlap. This idea is similar to the context idea- ( tification method Bal, which is used effectively h)r word sense disambiguation.</Paragraph>
      <Paragraph position="5">  One important chltracteristic of tills PKS Ol'ganiza.tlon is that it can be dynamicMly rearranged. \Ve clxn invMidate some PKSs by shnply removing ix docunlent from tile document list, or validate a new set (if PKSs tJy adding its paired document to tile list. This is extremely nseflil fur domaln-sensitive and context-sensitiw~ tri~ns latlon, since a close look at documents ill a seemingly similar domain will show that there are too many contlicting word senses and word selections to build it single consistent domain dictionary. 2 In the worst ca.~e, the, user has to keep oil asking the systenl to prefer olte ('if several word senses ,'ks lnally thnes ms ix new doculilent arrives to be translated.</Paragraph>
      <Paragraph position="6"> Another important observatklo is tlu~t tim system can calculate tile quallty of preference inhlrrnittion a.s follows: null  1. Given a document list, find ~dl the PI(Ss ill tho docnment list, aM create a PIqS graph I which is a directed gr~q)h, for each type of PKS (see Figure 2): * If the PKS is of type. (I'KI 'w('rrdi sens%), crelxte a no(h'. Nwl for 'mo'rdi, a n.de Nsj for sen.s'ej, and a directe.d arc a U (lal)(;led %ense&amp;quot;) from Nwi to Nsj.</Paragraph>
      <Paragraph position="7"> * If the PKS is of type (PK2 word; role wordj) create a node Nrl for wordi~ a node Nlj for wordj, and a directed a.rc aij (lal~eled with a syntactic role) from N'ri to Nlj.</Paragraph>
      <Paragraph position="8"> e If the PKS is of type (PK3 'lvo'rd; transj) create a node Nsl for &amp;quot;wordi, a node Ntj fl)r transj, and it directed iu'c (tij (hdmh~d &amp;quot;Irons&amp;quot;) from Nsi to Ntj.</Paragraph>
      <Paragraph position="9"> 2. Count the number C1 of' conJlicling arcs far the PK1 and PK3 graphs, That is, find the number (if arcs leaving tim saille node but gohig to dlffermit nodes.</Paragraph>
      <Paragraph position="10"> 3. Count the number CP~ of conJlicting ilalii,s ;~ for each  pair of nodes nl and n2, connected by an arc al in the PK2 trap} b Sllt:h that for solill! ltodl~ I13, there are two arcs a2 from n\] to n3, and ix3 rronl n3 ttl n2, wile.re a2 and a3 }lave the same label (see Figure 3).</Paragraph>
      <Paragraph position="11"> Intuitively, CI shows a ll/llll}'Jo.r of alillli{{llOtlS word senses and word selections, and OR shows possibh.' itttaehment ambiguities hi the giwm document list. 4 =I~ecall tile word senses of the word &amp;quot;lh~e&amp;quot; in Section 2. All of them ~ppear in the conlputer dotnitill I not&amp;l)ly in the Itrelts of text editors, graphics, find hardwtu'e marauds, respectively. Our ~pproach ~dlows tile user to lid just the sense dynttndcedly for each type of nlgrtu~tl.</Paragraph>
      <Paragraph position="12"> aA path is ~ sequence of directed arcs. (;ontllcting pMhs are two or store distinct sequellCe8 of paths hetween it giveil pair of nodes.</Paragraph>
      <Paragraph position="13"> 4SIlppose that nl is *l prepositional llllrltse~ 112 is it verb I,hrlise: anti n3 is Itnother prepositimud phrttse, Then, we have au ambiguity in tile nl uttadlment. Multiple outgoing arcs from ~L ,lode in the PK2 graph do not necessarily imply ambiguities.</Paragraph>
      <Paragraph position="14">  Thereh/re, a document lisL with hwge C1 and/or C2 gonerally should be divided into smaller lists for consls-Lent translation. Ill all ideal situation, C1 and C2 botll should be 0. Cole et id. \[:q and Nasukawafl 71 suggest that there is a strong telidellcy for C\] to be very slna\[l in a reiLSOllal)le spoil Of goxt.</Paragraph>
      <Paragraph position="15"> It is easily shown that for any two docllmellt lists I,l iUld 1/2 (1,1 N \[,2 -- (/)), the mtmbers of conllieting arcs and \[)al, ilS~ Ull, C121 (~'21, and C22, respectlve\]y, are memogonir. Tlmt is, the numbers C,1 and C2 of contlieting arcs and Inxths of the combined document list L (= l,i U 1,2) satisfy CI &gt; Ctt +C21 and C2 &gt; C2, + C~...</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML