File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/a97-1047_metho.xml
Size: 12,681 bytes
Last Modified: 2025-10-06 14:14:33
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-1047"> <Title>An English to Turkish Machine Translation System Using Structural Mapping</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Turkish Language </SectionTitle> <Paragraph position="0"> Morphology and syntax of Turkish are very different from English, therefore, the formalism used to represent English texts has to be altered significantly for Turkish text representation. The Turkish language is characterized as a head final language where the modifier/specifier always precedes the modified/specified. This characteristic also affects the word order of the sentences which can be described as SOV where the verb is positioned at the end.</Paragraph> <Paragraph position="1"> Also, when compared to other languages, Turkish relies more on overt case markings which mark the role of the argument in a sentence. The case markings enables Turkish to have a relatively free word-order property where every variation in the word order in a sentence results in a different meaning.</Paragraph> <Paragraph position="2"> In the MT system being developed, these and other different characteristics of the Turkish language are handled in the transfer and generation components.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Translation Domain </SectionTitle> <Paragraph position="0"> As more and more computer companies enter the Turkish market, a growing demand for English to Turkish translation of computer manuals has emerged. Other machine translation systems have also chosen the domain of computer manuals for their translation systems because of the relatively unambiguous and narrow sublanguage used (Tsutsumi, 1986). Also, in his research, Nasukawa (Nasukawa, 1993) concluded that the statistical analysis of the text in IBM computer manuals showed that 92.6 percent of the words in a computer manual are used in the same word sense which would significantly reduce the problem of lexical ambiguity resolution. Another advantage is that the material in a computer manual is observed to be written as clearly as possible in a relatively narrow area which will hopefully ease the difficult job of understanding and representing the input sentence.</Paragraph> <Paragraph position="1"> As a result of these observations, the TU-Language project team has chosen the IBM computer manuals as their translation domain..</Paragraph> </Section> <Section position="6" start_page="0" end_page="322" type="metho"> <SectionTitle> 4 Machine Translation System </SectionTitle> <Paragraph position="0"> The English to Turkish MT system under development uses a structural transfer approach which has the following components. First, the English sentence retrieved from the IBM manual is analyzed by the CLE parser (Alshawi and Moore. 1992) to generate an intermediate representat.ion. This representation is mapped onto a recursively embedded case frame which is then input to the transfer module. The transfer module maps the input case fi'ame into the target case frame which is then filtered to be transformed into the required input format of the target language generator. Lastly, the generator maps the Turkish case frame into the Turkish sentence which is then post-edited by a human translator to get. an intelligible and accurate translation.</Paragraph> <Section position="1" start_page="320" end_page="320" type="sub_section"> <SectionTitle> 4.1 Analysis Phase </SectionTitle> <Paragraph position="0"> For analyzing the English input, the (:;ore Language Engine developed by the SRI Cambridge Computer Science Research (:entre was used (Alshawi and Moore: 1992). The CLE system has been trained to meet the lexical, syntactic and semantic demands of the IBM corpus. In CLE, explicit intermediate levels of linguistic representation are used in the different phases of the analysis. Following the syntactic and semantic analysis/synthesis which uses the unification-based approach, the quasi logical form (QLF) is developed. QLF can be described as a contextually seT~silive logical form. Since the CLE syst.em produces various parses for an input sentence, the best parse is filtered by the system which conveys the intended meaning of the sentence. Then the chosen representation is mapped into a case frame.</Paragraph> </Section> <Section position="2" start_page="320" end_page="322" type="sub_section"> <SectionTitle> 4.2 Transfer Phase </SectionTitle> <Paragraph position="0"> Experience with previous systems using the interlingua technique showed the significant complexity of extracting and representing deep meaning of a natural language text (Goodman and Nirenburg, 1991).</Paragraph> <Paragraph position="1"> Another major difficulty encountered with this approach is that the language specific attributes neeessary to define the translation equivalents in the lexical and structural levels are neutralized in the interlingual representation thereby complicating the task of generation considerably.</Paragraph> <Paragraph position="2"> A similar problem occurred with systems using the transfer approach with deep semantic analysis such as the EUROTRA project (Johnson et al., 1985).</Paragraph> <Paragraph position="3"> Such systems were observed to be difficult to develop a.nd maintain. To avoid these problems, the MT systems developed recently generally chose to use the straightforward transfer approach which relies on various types of lexical, syntactic information and a limited use of semantic analysis (Tsntsumi, 1986).</Paragraph> <Paragraph position="4"> The system being developed as a part of the TU-Language project also chose the structural l.ransfer approach with a minimal amount of semantic analysis. T'he transfer phase of our MT systeln performs structural transfer between the respective case frames of the analysed English sentence and targetted Turkish output. In a top-down manner, the transfer lnodule tra.nsfbrms the English case frame or adds new infbrmatioll to the 'turkish case frame in order to generate the equivalent Turkish noun phrase, clause or sentence with the aid of a transfer dictionary, and the transfer rules.</Paragraph> <Paragraph position="5"> The English and Turkish case frames for clauses/sentences are generally similar to each other with differences seen in the sentence's mood and the verb's aspect and modality. Some information not extracted in the analysis phase such as the sentence form, clause type. role, etc. have to be determined in the beginning of the transfer phase and added to the Turkish case frame. An example sentence and parts of the corresponding English and Turkish case frames can be seen below: other in a number of ways because the generator requires additional information to form an equivalent Turkish representation. For example, in the Even though the word program is used in the plural form in both of the English sentences, the transfer module needs to determine the specificity of the noun phrase in question and send it to the generator which will accordingly output either the singular or plural form of the noun.</Paragraph> <Paragraph position="6"> Some of the complex transfer issues presented by Lindop and Tsujii (Lindop and Tsujii, 1991) also arise in our machine translation system. These issues are handled with special transfer rules and transfer lexicon entries. In the beginning of the transfer phase, the exception rules are tested and eventually a checklist containing the problematic components of the input is generated. Some examples of these components are verbs which change meaning when used with different attributes, passive, existential or conditional sentences, relative clauses, idiomatic use of prepositional phrases, etc. As the transfer process continues, the checklist is referenced in order to block the default translation and handle the exceptions. The rest of the mapping proceeds in a straightforward fashion until all of the information in the source case frame is mapped onto the target case frame.</Paragraph> <Paragraph position="7"> Some of the complex transfer issues handled in the transfer phase will be presented in this section.First, a significant amount of head-switching is performed to resolve the lexical and structural differences in the English and Turkish languages. In the example execution is the head noun of the English phrase whereas tesebbus (attempt) becomes the head noun in the target phrase.</Paragraph> <Paragraph position="8"> Another problem encountered in the transfer module is complex lexical transfer with category changes such as the example given below: (5) John gave a weak cough.</Paragraph> <Paragraph position="9"> John oksur+pst hafifce 'John hafifce oksurdu.:John coughed weakly.' The adjective weak has to be mapped onto an adverb hafifce and the verb give's default translation into the verb ver has to be blocked when it is used with the dependent noun cough. Consequently, the fitting target verb is found to be oksurmek.</Paragraph> <Paragraph position="10"> Also, dependent on the verb, an object of an English sentence may be mapped to different case markings in Turkish.</Paragraph> <Paragraph position="11"> As seen above,the object of the sentence *be man. is mapped either to a.n accusative marked object adami or a dative marked indirect object adama in the target sentence.</Paragraph> <Paragraph position="12"> There are also some complex structural changes encountered during transfer. An English clause might be mapped into a Turkish gerund: (8) While he was working +ken calis+tns 'Calisirken' Another example of a structural transforlnation encountered can be seen in active/passive forms of sentences. In the English passive tbrm, the surface subject can correspond to both the direct object or the indirect object of the active form. Yet in Turkish, the surface subject of a passive sentence can only be the direct object of the active form. The difference between the two sentences is distinguished by the order of the phrases in the target sentence as seen in the example below: (9) This program was given to the user.</Paragraph> <Paragraph position="13"> In both of the Turkish translations, the surface subject is program whereas the surface subject changes in the English inputs.</Paragraph> <Paragraph position="14"> The order of the words in the output sentences are determined by the topic and focus features of the target case frame which are mapped during the transfer phase. In the first sentence, the topic is found to be program, and the focus is kullanici, whereas in the second sentence the topic and the focus are kullanici and program., respectively.</Paragraph> <Paragraph position="15"> The transfer module also attacks problelns related to sentential transformation such as the ones required in the example below: (11) There are programs in the disk.</Paragraph> <Paragraph position="16"> var program+pl disk+loc 'Diskte progralnlar var' Parts of the case frames for the sentences above are as follows: aor l ro r ,q \[disk\]\] adjs place Other problems encountered in the transfer phase are the lexical gaps, idiomatic uses of phrases, and lexical disambiguation by syntactic or semantic content. null With all the complex transfer issues resolved in the transfer phase, the corresponding Turkish case frame is generated which is then translated froln its Prolog notation into the Lisp notation required by the generation module.</Paragraph> </Section> <Section position="3" start_page="322" end_page="322" type="sub_section"> <SectionTitle> 4.3 Generation Module </SectionTitle> <Paragraph position="0"> The generation component of the system is based on the GenKit environment developed at the Carnegie Mellon University - Center for Machine Translation which provides facilities for a unification-based generation grammar environment (Hakkani et al., 1996).</Paragraph> <Paragraph position="1"> As input, the generator receives a recursively embedded target case frame representation where all the lexica.1 choices have been made, and produces the Turkish sentence conveying the same meaning.</Paragraph> <Paragraph position="2"> Since Turkish has complex agglutinative word forms, a separate morphological generator handles the proper morpheme selection, vowel harmony, etc.</Paragraph> <Paragraph position="3"> to produce the surface form of the generated words.</Paragraph> <Paragraph position="4"> The Turkish sentence output by the generator is post-edited by a. human translator to ensure accuracy and intelligibility of the target sentence.</Paragraph> </Section> </Section> class="xml-element"></Paper>