File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1034_metho.xml
Size: 15,163 bytes
Last Modified: 2025-10-06 14:11:37
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1034"> <Title>A PROPER TREATMEMT OF SYNTAX AND SEMANTICS IN MACHINE TRANSLATION</Title> <Section position="4" start_page="159" end_page="162" type="metho"> <SectionTitle> III SYNTAX DIRECTED APPROACH: A PROTOTYPE ENGLISH-JAPANESE MACHINE TRANSLATION SYSTEM </SectionTitle> <Paragraph position="0"> So far we have developed two prototype machine translation systems; one is for English-Japanese translation \[6\] and the other is for Japanese-English translation* The prototype model system for English-Japanese translation (Figure I) is constructed as a syntax directed processor using a phrase structure type internal representation called HPM (Heuristic Parsing Model), where the semantics is utilized to disambiguate dependency relationships* The somewhat new name HPM (Heuristic Parsing Model) reflects the parsing strategy by which the machine translation tries to simultate the heuristic way of actual human of language translation* The essential features of heuristic translation are summarized in the following three steps: (I) To segment an input sentence into phrasal elements (PE) and clausal elements (CE).</Paragraph> <Paragraph position="1"> (2) To assign syntactic roles to PE's and CE's, and restructure the segmented elements into tree-forms by governing relation, and into link-forms by modifying relation* (3) To permute the segmented elements, and to assign appropriate Japanese equivalents with necessary case suffixes and postpositions.</Paragraph> <Paragraph position="2"> Noteworthy findings from operational experience and efforts to improve the prototype model are as follows: (I) The essential structure of English sentences should be grasped by phrase structure type representations.</Paragraph> <Paragraph position="3"> An example of phrase strucure type representation, which we call HPM (Heuristic Parsing Model), is illustrated in Figure 2. In Figure 2, a parsed tree is composed of two substructures. One is &quot;tree ( ~/ ),&quot; representing a compulsory dependency relation, and the other is &quot;link (k~),&quot; representing an optional dependency relation. Each node corresponds to a certain constituent of the sentence.</Paragraph> <Paragraph position="4"> The most important constituent is a &quot;phrasal element (PE)&quot; which is composed of one or more word element(s) and carries a part of the sentential meaning in the smallest possible form. PE's are mutually exclusive. In Figure 2, PE's are shown by using the &quot;segmenting marker (T)&quot;, such as where the terminologies in parentheses are the syntactic roles which will be discussed later. A &quot;clausal element (CE)&quot; is composed of one or more PE('s) which carries a part of sentential meaning in a nexus-like form. A CE roughly corresponds to a Japanese simple sentence such as: &quot;%{wa/ga/wo/no/ni} ~ {suru/dearu} \[koto\].&quot; CE's allow mutual intersection. Typical examples are the underlined parts in the following: &quot;It is important for you to do so.&quot; &quot;... intended to yield a fifth generation system.&quot; One interesting example in Figure 2 may be the part &quot;With some help from overseas&quot;, which is treated as only two consecutive phrasal elements. This is the typical result of a syntax directed parser. In the case of a semantics directed parser, the above-mentioned part will be treated as a clausal element. This is because the meaning of this part is &quot;(by) getting some help from overseas&quot; or the like, which is rather clausal than phrasal.</Paragraph> <Paragraph position="5"> (2) Syntax directed processors are effective and powerful to get phrase structure type parsed trees.</Paragraph> <Paragraph position="6"> Our HPM parser operates both in a top-down way globally and in a bottom-up way locally. An example of top-down operation would be the segmentation of an input sentence (i.e. the sequence of word elements (WE's)) to get phrasal elements (PE), and an example of bottom-up operation would be the construction of tree-forms or link-forms to get clausal elements (CE) or a sentence (SE). These operations are supported by syntax directed grammatical data such as verb dependency type codes (cf. Table i, which is a simplified version of Hornby's classification \[5\]), syntactic role codes (Table 2) and some production rule type grammars (Table 3 & Table 4). It may be permissible to say that all these syntactic data are fairly compact and the kernel parts are already well-elaborated (cf. \[i\], \[8\], \[ii\], \[12\]).</Paragraph> <Paragraph position="7"> (3) The weak point of syntax directed processors is their insufficient ability to disambiguate; i.e. the ability to identify dependency types of verb phrases and the ability to determine heads of prepositional phrase modifiers.</Paragraph> <Paragraph position="8"> (4) In order to boost the aforementioned disambiguation power, it is useful to apply semantic filters that facilitate the selective restrictions on linking a verb with nominals and on linking a modifier with its head.</Paragraph> <Paragraph position="9"> A typical example of the semantic filter is illustrated in Figure 3. The semantic filter may operate along with selective restriction rules such as: compatible with syntax directed processors; i.e. there is no need to reconstruct processors or to modify internal representations. It is only necessary to add filtrating programs to the syntax directed processor.</Paragraph> <Paragraph position="10"> One noteworthy point is that the thesaurus for controlling the semantic fields or semantic features of words should be constructed in an appropriate form (such as word hierarchy) so as to avoid the so-called combinatorial explosion of the number of selective restriction rules.</Paragraph> <Paragraph position="11"> (6) For the Japaneses sentence generating process, it may be necessary to devise a very complicated semantic processor if a system to produce natural idiomatic Japanese sentences is desired. But the majority of Japanese users may tolerate awkward word-by-word translation and understand its meaning. Thus we have concluded that our research efforts should give priority to the syntax directed analysis of English sentences. The semantics directed generation of Japanese sentences might not be an urgent issue; rather it should be treated as a kind of profound basic science to be studied without haste.</Paragraph> <Paragraph position="12"> (7) Even though the output Japanese translation may be an awkward word-by-word translation, it should be composed of pertinent function words and proper equivalents for content words.</Paragraph> <Paragraph position="13"> Otherwise it could not express the proper meaning of the input English sentences.</Paragraph> <Paragraph position="14"> (8) In order to select proper equivalents, semantic filters can be applied fairly effectively to test the agreement among the semantic codes assigned to words (or phrases). Again the semantic filter is not always complete. For example, in Figure 2, the verb &quot;yield&quot; has at least two different meanings (and consequently has at least two different Japanese equivalents): &quot;yield&quot;-->I&quot;produce&quot; (ffi Umidasu) \[&quot;concede&quot; (ffi Yuzuru).</Paragraph> <Paragraph position="15"> But it is neither easy nor certain how to devise a filter to distinguish the above two meanings mechanically. Thus we need some human aids such as post-editing and inter-editing.</Paragraph> <Paragraph position="16"> (9) As for the pertinent selection of function words such as postpositions, there are no formal computational rules to perform it. So we must find and store heuristic rules empirically and then make proper use of them.</Paragraph> <Paragraph position="17"> Some heruistic rules to select appropriate Japanese postpositions are shown in Table 5.</Paragraph> <Paragraph position="18"> and (2), the heuristic approach was also found to be effective in segmenting the input English sentence into a sequence of phrasal elements, and in structuring them into a tree-llke dependency diagram (cf. Figure 2).</Paragraph> <Paragraph position="19"> (Ii) A practical machine translation should be considered from a kind of heuristic viewpoint rather than from a purely rigid analytical linguistic viewpoint. One persuasive reason for this is the fact that humans, even foreign language learners, can translate fairly difficult English sentences without going into the details of parsing problems.</Paragraph> </Section> <Section position="5" start_page="162" end_page="165" type="metho"> <SectionTitle> IV SEMANTICS DIRECTED APPROACH: A PROTOTYPE JAPANESE-ENGLISH MACHINE TRANSLATION SYSTEM </SectionTitle> <Paragraph position="0"> The prototype model system for Japanese-English translation is constructed as a semantics directed processor using a conceptual dependency diagram as the internal representation.</Paragraph> <Paragraph position="1"> Noteworthy findings through operational experience and efforts to improve on the prototype model are as follows: (I) Considering some of the characteristics of the Japanese language, such as flexible word ordering and ambiguous usage of function words, it is not advantageous to adopt a syntax directed representation for the internal base of language transformation.</Paragraph> <Paragraph position="2"> For example, the following five Japanese sentences have almost the same meaning except for word ordering and a subtle nuance. Lowercase letters represent function words.</Paragraph> <Paragraph position="3"> Boku wa Fude de Tegami wo Kaku.</Paragraph> <Paragraph position="5"> Boku wa tegami wo Fude de Kaku.</Paragraph> <Paragraph position="6"> Fude de Boku wa Tegami wo Kaku.</Paragraph> <Paragraph position="7"> Tegami wa Boku wa Fude de Kaku.</Paragraph> <Paragraph position="8"> Boku wa Tegami wa Fude de Kaku.</Paragraph> <Paragraph position="9"> (2) Therefore we have decided to adopt the conceptual dependency diagram (CDD) as a compact and powerful semantics directed internal representation.</Paragraph> <Paragraph position="10"> Our idea of the CDD is similar to the well-known dependency grammar defined by Hays \[4\] and Robinson \[9\] \[i0\], except for the augmented case markers which play essentially semantic roles.</Paragraph> <Paragraph position="11"> (31 The conceptual dependency diagram for Japanese sentences is composed of predicate phrase nodes (PPNs in abbreviationl and nominal phrase nodes (NTNs in abbreviation). Each PPN governs a few NPNs as its dependants. Even among PPNs there exist some governor-dependant relationships.</Paragraph> <Paragraph position="12"> Examples of formal CDD description are:</Paragraph> <Paragraph position="14"> where the underlined word &quot;~' represents the m concept code corresponding to the superficial word &quot;a&quot;, and the augmented case markers are omitted.</Paragraph> <Paragraph position="15"> In the avove description, the order of dependants NI, N2, ..., Nn are to be neglected. For example, PPN (NPNn, ..., NPN2, NPNI) is identical to the above first formula. This convention may be different from the one defined by Hays \[4\]. Our convention was introduced to cope with the above-mentioned flexible word ordering in Japanese sentences.</Paragraph> <Paragraph position="16"> (4) The aforementioned dependency relationships can be represented as a linking topology, where each link has one governor node and one dependant node as its top and bottom terminal point (Figure 4).</Paragraph> <Paragraph position="17"> (5) The links are labeled with case markers. Our case marker system is obtained by augmenting the traditional case markers such as Fillmore's \[3\] from the standpoint of machine translation. For the PPN-NPN link, its label usually represents agent, object, goal, location, topic, etc. For the PPN-PPN link, its label is usually represent causality, temporality, restrictiveness, etc. (cf. Figure 4).</Paragraph> <Paragraph position="18"> (6) As for the total number of case markers, our current conclusion is that the number of compulsory case markers to represent predicative dominance should be small, say around 20; and that the number of optional case markers to represent adjective or adverbial modification should be large, say from 50 to 70 (Table 6). (7) The reason for the large number of optional case markers is that the detailed classification of optional cases is very useful for making an appropriate selection of prepositions and participles (Table 7).</Paragraph> <Paragraph position="19"> (g) Each NPN is to be labeled with some properly selected semantic features which are under the control of a thesaurus type lexicon. Semantic features are effective to disambiguate predicative dependency so as to produce an appropriate English verb phrase.</Paragraph> <Paragraph position="20"> (9) The essential difference between a Japanese sentence and the equivalent English sentence can be grasped as the difference in the mode of PPN selections, taken from the viewpoint of conceptual dependency diagram (Figure 51. Once an appropriate PPN selection is made, it will be rather simple and mechanical to determine the rest of the dependency topology.</Paragraph> <Paragraph position="21"> (I0) Thus the essential task of Japanese-English translation can be reduced to the task of constructing the rules for transforming the dependency topology by changing PPNs, while preserving the meaning of the original dependency topology (cf. Figure 5).</Paragraph> <Paragraph position="22"> (Ill All the aforementioned findings have something to do with the semantic directed approach. Once the English oriented conceptual dependency diagram is obtained, the rest of the translation process is rather syntactic. That is, the phrase structure generation can easily be handled with somewhat traditional syntax directed processors.</Paragraph> <Paragraph position="23"> (12) As is well known, the Japanese language has a very high degree of complexity and ambiguity mainly caused by frequent ellipsis and functional multiplicity, which creates serious obstacles for the achievement of a totally automatic treatment of &quot;raw&quot; Japanese sentences.</Paragraph> <Paragraph position="24"> (ex i) &quot;Sakana wa Taberu.&quot; (fish) (eat) has at least two different interpretations: * &quot;\[Sombody\] can eat a fish.&quot; . &quot;The fish may eat \[something\].&quot; Table 6 Case Markers for CDD (subset only) language should be constructed so as to restrict the input Japanese sentences within a range of clear tractable structures. The essential restrictions given by the sub-language should be concerned with the usage of function words and sentential embeddings.</Paragraph> <Paragraph position="25"> (IA) A sub-language approach will not fetter the users, if a Japanese-Engllsh translation system is used as an English sentence composing aid for Japanese people.</Paragraph> </Section> class="xml-element"></Paper>