XML Viewer - c94-1059

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1059_metho.xml
Size: 15,171 bytes
Last Modified: 2025-10-06 14:13:36
<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1059">
  <Title>ENGLISH GENERATION FROM INTERLINGUA</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2. Input anti Output
</SectionTitle>
    <Paragraph position="0"> The generator t,anslates an interlingt, a to a syntactic tree. Fig.2.1 shows a sample of input interlingnae and Fig.2.2, a sample of output syntactic trees. Both samples correspond to the same sentence &amp;quot;My brother will take the medicine&amp;quot;.</Paragraph>
    <Paragraph position="1"> This paper describes the generator that is originally implemented to correct and evah,ate English Word Dictionary and Concept Dictionary being developed in EDR (El)R,1993). To evaluate Concept Dictionary, as the first strategy, interlingua method was introduced. As the number o1' concepts is very large and they are elements of complex hierarchy, it is difficult to make roles and on the other hand the example-based method was expected to be more effective than the rule-based method. So, as the second strategy, the example-bused method was also introduced. null The example-based method is usually used in MT by the transfer method (Nagao, 1984; Sato, 1991; Stnnita, 1992), though one by Sadler (1989) is by the interlingua method. In this generator, the example-based method coexists with the interlingua method because of above reasons, but the combination of the example-based method and tim interlingua method is not intportant, because l'rom another point of view, the generation from interlingua is recognized as a translation from one hmguage i.e. interlingua to another i.e. English and the generation from interlingua can be seen similar as translations in above MT systems. So in this experiment, how to apply the example-based method to various natural hmguage processing and lbr which parts the method are suitable are the main interests. For this purpose, the generator is designed to execute the generation with maximum usage of the example-based method.</Paragraph>
    <Paragraph position="2"> In this experiment, the coverage of the generation is not complete, that is, some elements st, ch as articles and conjunctions are not generated.</Paragraph>
    <Paragraph position="3"> Below, section 2 describes the input and ot, tput of the generator, section 3, examples used in this system, section 4, the similarities used to retrieve examples and to select words, section 5, the generation algorithm, section 6, the experiments for verb selections and section 7, the conclusion.</Paragraph>
    <Paragraph position="4"> The examples, similarities and the generation algorithm are decided a priori then modilied in response to the output of the generator.</Paragraph>
    <Paragraph position="5"> To avoid confusions, the word &amp;quot;example&amp;quot; is used only (*) This work has been done when the author wits in EDR.</Paragraph>
    <Paragraph position="6"> &amp;quot;My brother will take tile medicine.&amp;quot;</Paragraph>
    <Paragraph position="8"/>
  </Section>
  <Section position="3" start_page="0" end_page="363" type="metho">
    <SectionTitle>
Fig.2.1 Input lnterlingua
</SectionTitle>
    <Paragraph position="0"> lnterlinguae consist of concepts, conceptual relations and attributes. Each concepts are classified as &amp;quot;statements&amp;quot; or &amp;quot;non-statements&amp;quot;. Concepts are represented by concept identification numbers (To distinguish concepts easily by men, concept illustrations are also given). Interpretations of codes relating to interlinguae in this paper are shown in Table 2.1. In the table, as for concept identification numbers, concept illustrations are showed as interp,'etations of codes.</Paragraph>
    <Paragraph position="1"> &amp;quot;My brother will take the medicine.&amp;quot;</Paragraph>
    <Paragraph position="3"> (3bf0D) a substance used on or in the body to treat a disease (3bdbf6) a drilled liquor named wtfiskey (3bd862) a drug or agent that reduces fever (3cee4f) to obtain a thing which one wante~l (3ceae3) to become a certain condition (0fde5f) to accept others' opinions and wishes i i .. (0c98dc) tbo first part of the day, from the time when the sun agent  rises, usually tmtil the time when the midday meal ts eaten Subject that brings about a voluntary action. Conscious attd automated entities are suclt subjects.  obligatory prepositional phrase relations between content words and functional words Syntactic trees consist of words, part-of-speeches, grammatic~d information and syntactic relations. The interpretations of codes relating to syntactic trees used in this paper are shown Table 2.2.</Paragraph>
  </Section>
  <Section position="4" start_page="363" end_page="364" type="metho">
    <SectionTitle>
3. Examples
</SectionTitle>
    <Paragraph position="0"> An example should be a pair of an interlingua and a syntactic tree. For the flexibility of usage of examples, interlinguae and syntactic trees in ex:unples are divided into smaller parts that are small enough to use flexibly but have enough information for generations.</Paragraph>
    <Paragraph position="1"> Fig.3.1 shows the common form of interlinguae and syntactic trees in examples (referred as &amp;quot;basic unit&amp;quot;, below). An example is a pair of fragments in this form made from an interlingua and a syntactic tree.</Paragraph>
    <Paragraph position="2"> tip (near to tile root of Ihe tree lower n~lt: structure of an interlingm0 lower arc lower node uppt r n(~le Upl~r arc ~ &amp;quot; attribute  Fig.3.2 shows the linguistic resources used by the generator. As the results of trying to execute as many processes as possible by the example-based method, it became necessary for the generator to use two different kinds of examples (referred as &amp;quot;Basic Example Set&amp;quot; and  Fig.3.3 shows examples in the Basic Example Set.</Paragraph>
    <Paragraph position="3"> Circlod nodes are &amp;quot;central nodes&amp;quot;. Basic Example Set is supposod to be used for selecting content words for concepts. Functional words except prepositions and grmnmatical information for inflections are removed, since they are unnecessary for this purpose. In Fig.3.2, example (A) and (13) have 11o upper node and Example (C) and (D) have no lower node. Examples in this set are accessed by concepts in the central nodes of interlinguae; Example (A) and (B) are accessed by (3bf0d2) and (C), by (3bf0f9) and (D) by (0c98dc) . When several examples with the same key exist, by the simih'u'ity defined below, only one example is finally accepted.</Paragraph>
    <Paragraph position="4"> Fig.3.4 shows examples in the Example Set for Attributes. This example set is supposed to be used for deciding inflection (i.e. selecting the word whose inflection corresponds to the attributes) and adding functional words for attributes. Content words in lower nodes are  removed, since the upper node influences to the inflection of the center word, but the lower nodes rarely don't. Functional words in lower nodes are added to the outputs. Concepts and spellings of words are also removed, since they can be decided by Basic Example Set and unnecessary here. Examples are accessed by combinations of attributes in interlinguae, some grammatical information of the upper node, those of central nodes and the surface relation of the upper arc; in Fig.3.4, Example (a) is accessed  by (past, -, EVE; EVED, -), Example (b) by (end, already, -, EVE; EVEN, -), Example (c) by (present, -, EVE; EVSTM; ECV9, -), Example (d) by (present, , -, EVE; EVIl, -), Example (e) by (future, -, EVE; FNSTM; ECV9,- ) and Example (1) by (-, EVE; EVDO0, EN 1, M(do) ). Example (a), (b), (c), (d) and (e) have no upper  node. Since examples in this set don't include concepts, examples are accessed deterministically and the similarity is not used.</Paragraph>
  </Section>
  <Section position="5" start_page="364" end_page="365" type="metho">
    <SectionTitle>
4. Similarities
</SectionTitle>
    <Paragraph position="0"> There are two major similarities in the example-based method. One is for the source language and used for selecting examples. Anotber is for the target language and used for creating outputs. In this generator, the lbrmer is the similarity between interlinguae (in tile form of basic t, nits) and the latter is the similarity between words. In the generator, the similarity is used only for Basic Ex-.</Paragraph>
    <Paragraph position="1">  The simihu'ity between interlingt,ae is defined its follows;</Paragraph>
    <Paragraph position="3"> Clcent, C2cent : concepts in central nodes Kcent : weight of simihuity between central nodes Cli, C2i : concepts in lower nodes with arc i k(x) : weight of similarity between concepts in lower nodes, x is tim number of elements in tbe interjunction srel(i) : surface relation which corresponds to the concept relation i R 1,R2 : set of conceptt,al relations each for ILl, 11.2 ntun(S) : the number of elements of set S It is always assured in adwmce by tile generator that 1) tile word in tbe upper node of tile input is already selected (if there is im upper node); 2) arcs of imerlingt, a, which cormspond to obligatory relations of tile syntactic tree in the ex;nnple, exist in the interjunction of P. 1 and R2; 3) upper arcs are same (if already decided); 4) part-of-speeches of words in upper nodes are same. l:,xamples that don't satisfy these  four conditions are rejected before the similarity calculation. The similarity between concepts used in the above similarity is defined as follows; Sc(Cl ,C2) = the ~lumber of common ancesters the number of ancesters of CI + the number of ancesters of C2 Ilere, ancestors until three layers above are used. (Cut; 1993) It is difficult to find the most similar interlingua in an example set to the input interlingua, because to find it, it is necessary to calculate all similarities between interlinguae in the ex,-unple set and the input. To avoid this, in this generator, some constraints are given for access keys i.e. central nodes. For &amp;quot;statements&amp;quot; in interlingua, central nodes of examples should be same with that of the input and for &amp;quot;non-statements&amp;quot; in interlingua, central nodes of examples can be tile s,'u-ne concepts or sister concepts in the concept hierarchy. By this constraints, the search of examples can be executed fast.</Paragraph>
    <Paragraph position="4"> The similarity between words is defined as follow; k (0 &lt; k &lt; 1) if p~t-of-tspe.eeh and lgralnmaticnt infornl~tlon</Paragraph>
    <Paragraph position="6"/>
  </Section>
  <Section position="6" start_page="365" end_page="365" type="metho">
    <SectionTitle>
5. Generation Algorithm
</SectionTitle>
    <Paragraph position="0"> The generator generates fragments of a syntactic tree and tiredly combines them into a syntactic tree.</Paragraph>
    <Paragraph position="1"> The generation algorithm is as follows; Step 1 : Sets the current central node at the root node of the input interlingua.</Paragraph>
    <Paragraph position="2">  from the candidate word lists and checks if there is an example in Example Set for Attributes, whose attributes and words in the central node coincide with attributes in the current basic unit and the selected word.</Paragraph>
    <Paragraph position="3"> Step 3-3 : If the word selection succeeded, accepts the example. Generates upper arc (if exists), lower arc (only for obligatory relations) central nodes ,and functional words for the central node, saves the results and similarity and calculates the similarity of interlingua between the input and the example. Prepositions are extracted from the basic example.  Suppose the interlingua such as Fig.5.1 is inputted and examples in Fig.3.3 are used as Basic Example Set and Fig.3.4 used as Example Set for Attributes.</Paragraph>
    <Paragraph position="4"> The list of candidate words for {3bf0d2} is as fol-</Paragraph>
    <Paragraph position="6"> From Basic Example Set, Example (A) and (B) are retrieve(l, since central nodes are same.</Paragraph>
    <Paragraph position="7"> By Example (A) and Example (a), took(EVE; EVED; EVDO0) is selected and by Example (B) and Example (a), drank(EVE; EVED; EVDO0) is selected.</Paragraph>
    <Paragraph position="8"> As similmity between the input and Example (A) is larger than that between the inpvt and Example (B), &amp;quot;took&amp;quot; is selected. This is because similarity between {3bd862} but (3bf0fg} is 0.876535 and one between {3bd862} and {3bdbf6} is 0.</Paragraph>
  </Section>
  <Section position="7" start_page="365" end_page="366" type="metho">
    <SectionTitle>
6. Experiments for Verb Selections
</SectionTitle>
    <Paragraph position="0"> This chapter describes experiments to evaluate examples, similarities and the generation algorithm. Experiments for verb selections are executed.</Paragraph>
    <Paragraph position="1"> The generator selects one word from candklate word list retrieved from EDR English Dictionary.</Paragraph>
    <Paragraph position="2"> The experiments are (lone by Jack-knife test method (Sumita; 1992) ; 1) Specify a concept; 2) Collect examples that include a word in candidate word list whose meaning is same with the specified concept ; 3) Remove one example from example sets; 4) Make tile input interlingua from the removed example; 5) Generate a sentence from this interlingua by using remained examples; 6) Compare the original word and the generated word for the verb; 7) Repeat 3) - 6) by removing each example in turn.</Paragraph>
    <Paragraph position="3"> Below the results of three experiments (Experiment 1,  'Fable 6.1 shows specified concepts for experiments and candidate word lists for the concepts. As for Experiment 1 and Experiment 2, words that have no examples is omitted from candidate word lists, since they won't never be selected. Fig.6.1, Fig.6.2 and Fig.6.3 show examples and generated sentences for Experiment 1, Experiment 2 and Experiment 3 each. Examples in Fig.6.1 ,'rod Fig.6.2 are extracted from EDR English Corpus and examples in Fig.6.3 are extracted from a published printed English-Japanese dictionary, though some modifications (Tenses, aspects ,  modals are all same. SI, bjects are same if possible) arc done. Sentences in the left hand sides of ,arrows are original sentences and those in the right hand side are generated sentences (In generated sentences, only verbs are generated words and others are copied from origimd sentences). Underlined words are words for the specified concepts. For sentences with a circle at the head of left hand sides, the generator selects same words with those in the original sentences. Sentences without circles include both right and wrong results. null In interlingua method, roughly speaking, all words corresponding to it concept are basically right its the generated word if it is grammatically consistent. So the evaluation of tire experiments is delicate.</Paragraph>
    <Paragraph position="4"> The rates of coincides between original verbs and genero ated verbs are 85% (Experiment 1), 13% (Experiment 2) and 16% (Experiment 3). Since some sentences without coincides can be also right, the real rates of success are lager than above nt, mbers.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML