File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-1603_metho.xml
Size: 13,768 bytes
Last Modified: 2025-10-06 14:08:09
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1603"> <Title>Plaesarn: Machine-Aided Translation Tool for English-to-Thai Prachya Boonkwan and Asanee Kawtrakul Specialty Research Unit of Natural Language Processing</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Translation Approaches </SectionTitle> <Paragraph position="0"> We can classify current translation approaches into three major models as follows--structural transfer, semantic transfer, and lexical transfer (Trujillo, 1999).</Paragraph> <Paragraph position="1"> + Structural transfer: this methodology heavily depends on syntactic analysis (say, grammar). Translation transfers the source language structures into the target language. This method is established by the assumption that every language in the world uses syntactic structure in order to represent the meaning of sentences.</Paragraph> <Paragraph position="2"> + Semantic transfer: this methodology heavily depends on semantic analysis (say, meaning). This model applies syntactic analysis as well. On the contrary to the structural transfer, a source language sentence is not immediately translated into the target language, but it is first translated into semantic representation (Interlingua is mostly referred), and afterwards into the target language. This method is established by the assumption that every language in the world describes the same world; hence, there exists the semantic representation for every language.</Paragraph> <Paragraph position="3"> + Lexical transfer: this methodology heavily depends on lexicon ordering patterns. The translation occurs at the level of morpheme. The translation process transfers a set of morpheme in the source language into that of the target language.</Paragraph> <Paragraph position="4"> In this project, wedecided to utilize the structural transfer approach, since it is more appropriate for rapid development. In addition, semantic representation that covers every language is now still under research.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Relevant Problems and Their </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Solutions 2.1 Structural Ambiguity </SectionTitle> <Paragraph position="0"> By the reason of the ambiguities of natural languages, a sentence may be translated or interpreted into many senses. An example of structural ambiguity is &quot;I saw a girl in the park with a telescope.&quot; This sentence can be grammatically interpreted into four senses as follows.</Paragraph> <Paragraph position="1"> + I saw a girl, to whom a telescope belonged, who was in the park.</Paragraph> <Paragraph position="2"> + I used a telescope to see a girl, who was in the park.</Paragraph> <Paragraph position="3"> + I was in the park and seeing a girl, to whom a telescope belonged.</Paragraph> <Paragraph position="4"> + I was in the park and using a telescope to see a girl.</Paragraph> <Paragraph position="5"> Furthermore, an example of word-sense ambiguity is &quot;I live near the bank.&quot; The noun bank can be semantically interpreted into at least two senses as follows.</Paragraph> <Paragraph position="6"> + n. a financial institution that accepts deposits and channels the money into lending activities + n. sloping land (especially the slope beside a body of water) In order to resolve structural ambiguity, we apply the concept of the statistical machine translation approach (Brown et al., 1990). We apply the Maximum-Entropy-Inspired Parser (Charniak, 1999) (so-called Charniak Parser) to analyze and determine the appropriate grammatical structure of an English sentence. From (Charniak, 1999), Charniak presented that the parser uses the Penn Tree Bank tag set (Marcus etal., 1994)(orPTBinabbreviation)asagrammatical structure representation, and it yielded 90.1% average precision for sentences of length 40 or less, and 89.5% for sentences of length 100 and less. Moreover, with the intention to resolve word-sense ambiguity, we embedded a numerical statistic value with each translation rule (including lexical transfer rule) with the major aim of assisting to select the best translation parse tree from every possibility (Charniak, 1997). Section 3.4 will describe the method and the tool to do so.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Phrase Translation </SectionTitle> <Paragraph position="0"> Phrase is a word-ordering pattern that cannot be separately translated. An example is the translation of the verb to be. The translation of that depends on the context--for instance, to be succeeding with noun phrase is translated to xe0xbbx9axb9 /penm/, succeeding with prepositional phrase to xcdxc2xd9xe8 /yuul/, in progressive tenses to xa1xd3xc5xd1xa7 /kammlangm/, in passive voice to xb6xd9xa1 /thuukl/, and succeeding with adjectival phrase to translation omission. Another example is the verbal phrase to look for something. It must be translated to xc1xcdxa7xcbxd2 /m!!ngmhaar/ not to xc1xcdxa7xcaxd3xcbxc3xd1xba /m!!ngm samrrabl/. The word look is translated to xc1xcdxa7 /m!!ngm/, and for to xcaxd3xcbxc3xd1xba /samrrabl/.</Paragraph> <Paragraph position="1"> From empirical observation, we found that the PTB tag set is rather problematical to translate into Thai. We hence implement the parse tree modification process in order to relieve the complexity of transformation process (Trujillo, 1999). In this process, the heads of the tree are recursively modified so as to facilitate phrase translation. A portion of parse tree modification rules shown on Table 1 is described in parenthesis format.</Paragraph> <Paragraph position="2"> Obviously, from Table 1, we can more easily compose the rules in Table 2 to translate the verb to be and the phrasal verb look for something. null</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Lexicon Rearrangement </SectionTitle> <Paragraph position="0"> In English, we can normally modify a certain core noun with modifiers in two ways--</Paragraph> <Paragraph position="2"> putting them in front of or behind it. We will focus the first case in this paper. The problem occurs as soon as we would like to translate a sequence of nouns and a sequence of adjectives. The first case is translated backwards, while the second forwards. An example for this problem is that &quot;she is a beautiful diligent slim laboratory member&quot; is translated to xe0xb8xcdxe0xbbx9axb9xcaxc1xd2xaaxd4xa1xe1xc5xe7xbaxb7xd5x8bxcaxc7xc2xa2xc2xd1xb9xbcxcdxc1/th++m penm salmaamchikh thiif suayr khalyanr ph!!mr/. The word she is translated to xe0xb8xcd, is to xe0xbbx9axb9, member to xcaxc1xd2xaaxd4xa1, laboratory to xe1xc5xe7xba, beautiful to xcaxc7xc2, diligent to xa2xc2xd1xb9, and slim to xbcxcdxc1. With the purpose to solve this problem, we first group nouns and adjectives into groups--NNS and ADJS--and we apply a number of structural transfer rules. Table 3 shows a portion of transfer rules.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.4 Classifier Generation </SectionTitle> <Paragraph position="0"> The vital linguistic divergence between English and Thai is head-noun-corresponding classifiers (Lamduan, 1983). In English, classifiers are never used in order to identify the numeric number of a noun or definiteness. On the contrary, classifiers are generally used in Thai--for example, in English, a number precedes a noun phrase; butincontrast, aclassifiertogetherwith the number succeeds in Thai.</Paragraph> <Paragraph position="1"> In order to generate a classifier, we develop the classifier matching algorithm. By empirical observation, it is noticeable that the head noun in the noun phrase always indicates the classifier. For example, supposing the rules in Table 4 are amassed in the linguistic knowledge base.</Paragraph> <Paragraph position="2"> Thus, we can revise &quot;xc3xb6xe4xbfxe0xcbxd2xd0xb5xd5xc5xd1xa7xa1xd2 /rothhfaim h!l tiimlangmkaam/ 3 <cl>&quot; and &quot;xc3xb6xc2xb9xb5xec /rothhyonm/ 4 <cl>&quot; can be respectively revised to &quot;xc3xb6xe4xbfxe0xcbxd2xd0xb5xd5xc5xd1xa7xa1xd2 3 xa2xbaxc7xb9&quot; (three roller coasters) and &quot;xc3xb6xc2xb9xb5xec 4 xa4xd1xb9&quot; (four automobiles). If there is no rule that can match the noun phrase, its head noun is used as the classifier (Lamduan, 1983)--for example, xbbxc3xd0xe0xb7xc8 fact, no corresponding classifier. As soon as we would like to specify, as the latter example, the numeric number, we say &quot;xbbxc3xd0xe0xb7xc8xbexd1xb2xb9xd2xe1xc5xe9xc7 x33 xbbxc3xd0xe0xb7xc8&quot; /praltheesf phathhthahnaam lfifiwh/ (three developed countries).</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 System Overview </SectionTitle> <Paragraph position="0"> As illustrated in the Figure 1, the system comprises of four principle components--syntactic analysis, structural transformation, sentence generation, and linguistic knowledge acquisition. null</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Syntactic Analysis and Parse-Tree Modification </SectionTitle> <Paragraph position="0"> In this process, we analyze each sentence of the source documents with the Charniak Parser and afterwards transform each of which into a parse tree.</Paragraph> <Paragraph position="1"> The first process that we have to accomplish first is the sentence boundary identification. In this step, we require users to manually prepare sentenceboundaries byinserting a new-line character among sentences.</Paragraph> <Paragraph position="2"> The next step is the sentence-parsing process. We analyze the surface structure of a sentence with the Charniak Parser. In this case, the original Charniak Parser nevertheless spends long time for self-initiation to load its considerably huge database. Consequently, we patched it to be a client-server program so as to eliminate such time.</Paragraph> <Paragraph position="3"> As stated earlier, in the view of the fact that parse trees generated by the Charniak Parser are quite complicated to translate into Thai, we therefore implement the parse tree modification process (see Section 2.2).</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Structural Transformation </SectionTitle> <Paragraph position="0"> This process performs recursive transformation from the source language parse trees intoaset of corresponding Thai translation parse trees with their probabilities. As stated earlier, there are some complexity in order to transfer a PTBformatted parse tree into Thai, we thus implemented the parse tree modification process (see Section 2.2) before performing transformation.</Paragraph> <Paragraph position="1"> The transformation relies on the transformation rules from the linguistic knowledge base.</Paragraph> <Paragraph position="2"> A single step of transformation process matches the root node and single-depth child nodes with the transformation rules and afterwards returns a set of transformation productions. As stated earlier, we embedded the probability of each rule. The probability of a parse tree ... is given by the equation</Paragraph> <Paragraph position="4"> where ...k is the k-th subtree of the parse tree ...</Paragraph> <Paragraph position="5"> whose number of member subtrees is n, c... represents the constituent of the tree ..., and - is a probability relation that maps the constituents of the root and its single-depth children to the probability value.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Sentence Generation </SectionTitle> <Paragraph position="0"> This process generates a target language sentence from the parse tree. This stage also relies on the linguistic knowledge base. The additional process is the noun classifier. We apply the methodology defined in classifier matching algorithm (see Section 2.4). Finally, the system will show the translations of the most possibility and let the users change each solution if they would like to do so.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.4 Linguistic Knowledge Acquisition </SectionTitle> <Paragraph position="0"> We provided an advantageous tool so as to manually train new translation knowledge. Currently, it comprises of the translation rule learner and the English-Thai unknown word aligner (Kampanya et al., 2002).</Paragraph> <Paragraph position="1"> In this module, the translation rule learner obtains document and analyzes that into a set of parse trees. Afterwards, the users manually teach it the rules to grammatically translate a certain tree from the source language into the target language with the rules following to the Backus-Naur Form (Lewis and Paradimitriou, 1998) (or BNF in abbreviation). This module will determine whether the rule is re-trained.</Paragraph> <Paragraph position="2"> If so, the module will raise the probability of that rule up. If not, it will add that rule to the knowledge base.</Paragraph> <Paragraph position="3"> Moreover, the aligner is utilized to automatically update the bilingual dictionary. For our future work, we intend to develop a system to automatically learn new translation rules from our corpora.</Paragraph> </Section> </Section> class="xml-element"></Paper>