XML Viewer - p87-1004

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/87/p87-1004_metho.xml
Size: 16,121 bytes
Last Modified: 2025-10-06 14:12:02
<?xml version="1.0" standalone="yes"?>
<Paper uid="P87-1004">
  <Title>JETR: A ROBUST MACHINE TRANSLATION SYSTEM</Title>
  <Section position="3" start_page="0" end_page="26" type="metho">
    <SectionTitle>
CIIARACTERISTICS OF TilE JAPANESE LANGUAGE
</SectionTitle>
    <Paragraph position="0"> The difficulty of translation depends on the similarity between the languages involved. Japanese and English are vastly different languages. Translation from Japanese to English involves restructuring of sentences, disambiguation of words, and additions and  deletions of certain lexical items. The following characteristics of the Japanese language have influenced the design of the JETR system:  1. Japanese is a left-branching, postpositional, subject-object-verb language. 2. Particles and not word order are important  in determining the roles of the noun phrases in a Japanese sentence.</Paragraph>
    <Paragraph position="1"> . Information is usually more explicitly stated in English than in Japanese. There are no articles (i.e. &amp;quot;a&amp;quot;, &amp;quot;an&amp;quot;, and &amp;quot;the&amp;quot;). There are no singular and plural forms of nouns. Grammatical sentences can have their subjects and objects missing (i.e.</Paragraph>
    <Paragraph position="2"> ellipses).</Paragraph>
    <Paragraph position="3">  marker) furu (sprinkle).</Paragraph>
    <Paragraph position="4"> The first sentence lacks the main verb, while the second sentence lacks the particle after the noun &amp;quot;shin.&amp;quot; The role of &amp;quot;shin&amp;quot; must be determined without relying on the particle and the word order. In addition to the problems of unknown words and unclear or ambiguous interpretation, missing particles and verbs are often found in recipes, instruction booklets and other informal texts posing special problems for machine translation systems. The Particle-Driven Analyzer (PDA) is a robust intrasentence analyzer designed to handle ungrammatical sentences in an elegant and efficient manner.</Paragraph>
    <Paragraph position="5"> While analyzers of the English language rely heavily on verb-oriented processing, the existence of particles in the Japanese language and the subject-object-verb word order have led to the PDA's reliance on forward expectations from words other than verbs. The PDA is unique in that it does not rely on the presence of particles and verbs in the source text. To take care of missing particles and verbs, not only verbs but all nouns and adverbs are made to point to action frames which are structures used to describe actions. For both grammatical and ungrammatical sentences, the PDA continuously combines and refines forward expectations from various phrases to determ/ne their roles and to predict actions. These expectations are semantic in nature and disregard the word order of the sentence. Each expectation is an action-role pair of the form (&lt;action&gt; &lt;role&gt;). Actions are names of action frames while roles correspond to the slot names of action frames. Since the main verb is almost always found at the end of the sentence, combined forward expectations are strong enough to point to the roles of the nouns and the meaning of the verb. For example, consider &amp;quot;neji (screw) migi (right) * 3 kurikku (clicks).&amp;quot; By the time, &amp;quot;3 clicks&amp;quot; is read, there are strong expectations for the act of turning, and the screw expects to be the object of the act.</Paragraph>
    <Paragraph position="6"> Input: &lt;muM&gt; o ~ ~ &lt;verb&gt;</Paragraph>
    <Paragraph position="8"> refinement process. In order to keep the expectation list to a manageable size, only ten of the most likely roles and actions are attached to each word.</Paragraph>
    <Paragraph position="9">  The PDA is similar to IPP (Lebowitz 1983) in that words other than verbs are made to point to structures which describe actions. However, unlike IPP, a generic role-filling process will be invoked only if an  unexpected verb is encountered or the forward expectations do not match. Figure 3 shows such a case. The verb will not invoke any role-filling or role-determining process ff the semantic expectations from the other phrases match the verb. Therefore, the PDA discourages inefficient verb-initiated backward searches for role-fillers even when particles are missing.</Paragraph>
    <Paragraph position="10"> Unlike LUTE (Shimazu 1983), the PDA's generic role-filling process does not rely on the presence of particles. To each slot of each action frame, acceptable filler types are attached. When particles are missing, the role-filling rule matches the object types of role fillers against the information attached to action frames. The object types in each domain are organized in a hierarchy, and frame slots are allowed to point to any level in the hierarchy.</Paragraph>
    <Paragraph position="11"> Verbs with multiple meanings are disambiguated by starting out with a set of action frames (e.g. a2 and a3) and discarding a frame if a given phrase cannot fill any slot of the frame.</Paragraph>
    <Paragraph position="12"> The PDA's processes can be summarized as follows:  1. Grab a phrase bottom-up using syntactic and semantic word classes. Build an object frame if applicable.</Paragraph>
    <Paragraph position="13"> 2. Recall all expectations (action-role pairs) attached to the phrase.</Paragraph>
    <Paragraph position="14"> 3.</Paragraph>
    <Paragraph position="15"> 4.</Paragraph>
    <Paragraph position="16">  If a particle follows, use the particle to refine the expectations attached to the phrase.</Paragraph>
    <Paragraph position="17"> Take the intersection of the old and new expectations.</Paragraph>
    <Paragraph position="18">  5. If the intersection is empty, set a flag. 6.</Paragraph>
    <Paragraph position="19"> 7.</Paragraph>
    <Paragraph position="20">  If this is a verb phrase and the flag is up, invoke the generic role-filling process. Else if this is the end of a simple sentence, build an action frame using forward expectations.</Paragraph>
    <Paragraph position="21"> 8. Otherwise go back to Step 1.</Paragraph>
    <Paragraph position="22"> To achieve extensibility and flexibility, ideas such as the detachment of control structure from the word level, and the combination of top-down and bottom-up processing have been incorporated.</Paragraph>
  </Section>
  <Section position="4" start_page="26" end_page="28" type="metho">
    <SectionTitle>
SIMULTANEOUS GENERATOR
</SectionTitle>
    <Paragraph position="0"> Certain syntactic features of the source text can serve as functionally relevant features of the situation being described in the source text. Preservation of these features often helps the meaning and the nuance to be reproduced. However, knowledge-based systems discard the syntax of the original text. In other words, the information about the syntactic style of the source text, such as the phrase order and the syntactic classes of the original words, is not found in the internal representation. Furthermore, inferred role fillers, causal connections, and events are generated disregarding the brevity of the original text. For example, the generator built by the Electrotechnical Laboratory of Japan (Ishizaki 1983), which produces Japanese texts from the conceptual representation based on MOPs (Schank 1982), generates a pronoun whenever the same noun is seen the second time. Disregarding the original sentence order, the system determines the order using causal chains. Moreover, the subject and object are often omitted from the target sentence to prevent wordiness.</Paragraph>
    <Paragraph position="1"> Unlike other knowledge-based systems, JETR can preserve the syntax of the original text, and it does so without building the source-language tree. The generation algorithm is based on the observation that human translators do not have to wait until the end of the sentence to start translating the sentence. A human translator can start translating phrases as he receives them one at a time and can apply partial syntaxtransfer rules as soon as he notices a phrase sequence which is ungrammatical in the target language.</Paragraph>
    <Paragraph position="2">  The generator does not go through the complete semantic representation of each sentence built by the other components of the system. As soon as a phrase is processed by the PDA, the generator receives the phrase along with its semantic role and starts generating the phrase if it is unambiguous. Thus the generator can easily distinguish between inferred information and information explicitly present in the  source text. The generator and not the PDA calls the context analyzer to obtain missing information that are needed to translate grammatical Japanese sentences into grammatical English sentences. No other inferred information is generated. A preposition is not generated for a phrase which is lacking a particle, and an inferred verb is not generated for a verb-less sentence. Because the generator has access to the actual words in the source phrase, it is able to reproduce frequent occurrences of particular lexical items. And the original word order is preserved as much as possible. Therefore, the generator is able to preserve idiolects, emphases, lengths, ellipses, syntax errors and ambiguities due to missing information.</Paragraph>
    <Paragraph position="3"> Examples of target sentences for special cases are shown in Figure 4.</Paragraph>
    <Paragraph position="4"> To achieve structural invariance, phrases are output as soon as possible without violating the English phrase order. In other words, the generator pretends that incoming phrases are English phrases, and whenever an ungrammatical phrase sequence is detected, the new phrase is saved in one of three queues: SAVED-PREPOSITIONAL, SAVED-REFINER, and SAVED-OBJECT, As long as no violation of the English phrase order is detected or expected, the phrases are generated immediately. Therefore, no source-language tree needs to be constructed, and no structural information needs to be stored in the semantic representation of the complete sentence.</Paragraph>
    <Paragraph position="5"> To prevent awkwardness, a small knowledge base which relates source language idioms to those of the target language is being used by JETR; however, one problem with the generator is that it concentrates too much on information preservation, and the target sentences are awkward at times. Currently, the system cannot decide when to sacrifice information preservation. Future research should examine the ability of human transla~rs to determine the important aspects of the source text.</Paragraph>
    <Paragraph position="6"> INSTRA: Tile CONTEXT ANALYZER The context analyzer component of JETR is called  INSTRA (INSTRuction Analyzer). The goal of INSTRA is to aid the other components in the following ways: I. Keep track of the changes in object types and forward expectations as objects are modified by various modifiers and actions.</Paragraph>
    <Paragraph position="7"> . Resolve pronoun references so that correct English pronouns can be generated and expectations and object types can be associated with pronouns.</Paragraph>
    <Paragraph position="8"> . Resolve object references so that correct expectations and object types can be associated with objects and consequently the article and the number of each noun can be determined.</Paragraph>
    <Paragraph position="9"> 4. Choose among the multiple interpretations of a sentence produced by the PDA.</Paragraph>
    <Paragraph position="10"> . Fill ellipses when necessary so that well null formed English sentences can be generated.</Paragraph>
    <Paragraph position="11"> In knowledge-based systems, the context analyzer is designed with the goal of natural-language understanding in mind; therefore, object and pronoun references are resolved, and ellipses are filled as a by product of understanding the input text. However, some human translators claim that they do not always understand the texts they translate (Slocum 1985). Moreover, knowledge-based translation systems are less practical than systems based on direct and transfer methods. Wilks (1973) states that &amp;quot;...it may be possible to establish a level of understanding somewhat short of that required for question-answering and other intelligent behaviors.&amp;quot; Although identifying the level of understanding required in general by a machine translation system is difficult, the. level clearly depends on the languages, the text type and the tasks involved in translation. INSTRA was designed with the goal of identifying the level of understanding required in translating instruction booklets from Japanese to English.</Paragraph>
    <Paragraph position="12"> A unique characteristic of instruction booklets is that every action produces a clearly defined resulting state which is a transformed object or a collection of transformed objects that arc likely to be referenced by later actions. For example, when salt is dissolved into water, the salty water is the result. When a screw is turned, the screw is the result. When an object is placed into liquid, the object, the liquid, the container that contains the liquid, and everthing else in the container are the results. INSTRA keeps a chain of the resulting states of the actions. INSTRA's five tasks all deal with searches or modifications of the results in the chain.</Paragraph>
    <Paragraph position="13">  To keep track of the state of each object, the object type and expectations of the object are changed whenever certain modifiers are found. Similarly, at the end of each sentence, 1) the object frames representing the result objects are extracted from the frame, 2) each result object is given a unique name, and 3) the type and expectations are changed if necessary and are attached to the unique name. To identify the result of each action, information about what results from the action is attached to each frame. The result objects are added to the end of the chain which may already contain the ingredients or object components. An example of a chain of the resulting states is shown in Figure 5.</Paragraph>
    <Paragraph position="14"> In instructions, a pronoun always refers to the result of the previous action. Therefore, for each pronoun reference, the unique name of the object at the end of the chain is returned along with the information about the number (plural or singular) of the object.</Paragraph>
    <Paragraph position="15"> For an object reference, INSTRA receives an object frame, the chain is searched backwards for a match, and its unique name and information about its number are returned. INSTRA uses a set of rules that takes into account the characteristics of modifiers in instructions to determine whether two objects match. Object reference is important also in disambiguating item parts. When JETR encounters an item part that needs to be disambiguated, it goes through the chain of results to find the item which has the part and retrieves an appropriate translation equivalent. The system uses additional specialized rules for step number references and divided objects.</Paragraph>
    <Paragraph position="16"> Ellipses are filled by searching through the chain backwards for objects whose types are accepted by the corresponding frame slots. To preserve semantic, pragmatic and structural information, ellipses are filled only when 1) missing information is needed to generate grammatical target sentences, 2) INSTRA must choose among the multiple interpretations of a sentence produced by the PDA, or 3) the result of an action is needed.</Paragraph>
    <Paragraph position="17"> The domain-specific knowledge is stated solely in terms of action frames and object types. INSTRA accomplishes the five tasks I) without pre-editing and post-editing, 2) without relying on the user except in special cases involving unknown words, and 3) without fully understanding the text. INSTRA assumes that the user is monolingual. Because the method refrains from using inferences in unnecessary cases, the semantic and pragmatic information contained in the source text can be preserved.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML