File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/w93-0223_concl.xml

Size: 6,575 bytes

Last Modified: 2025-10-06 13:57:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="W93-0223">
  <Title>HOW COULD RHETORICAL RELATIONS BE USED IN MACHINE TRANSLATION? (AND AT LEAST TWO OPEN QUESTIONS)</Title>
  <Section position="4" start_page="86" end_page="88" type="concl">
    <SectionTitle>
SUBLANGUAGES AND SCHEMATA
</SectionTitle>
    <Paragraph position="0"> In the sublanguages I studied, however, I found out that the schemata of rhetorical predicates could not be always uniquely defined. There are sublanguages where more than one typical schema should be defined and consequently used. I examined numerous texts on which basis I defined &amp;quot;stable schemata&amp;quot;. The schemata S 1, $2 .... SN can be considered &amp;quot;stable&amp;quot; if 1) SI/N~5, VI and 2) ESI/N'~y where N is the number of all examined texts, 5, '1' are numbers in the interval (0,1) which we call &amp;quot;individual contribution minimum&amp;quot; and &amp;quot;global contribution minimum&amp;quot; respectively. The idea behind these mathematics is that schemata can be considered as &amp;quot;stable&amp;quot; if they as a whole represent a significant portion of all examined texts and yet every &amp;quot;stable&amp;quot; schema should be itself representative.</Paragraph>
    <Paragraph position="1"> For translation from English into Malay, if more than one stable schema is available in the respective sublanguage, the stable schema, which is closest to the input of English text is chosen. For determining closeness, special metrics has been developed which takes into account not only the number of displaced predicates, but also the size of the displacement and the maximal length of matched substfings from the input and output schemata of rhetorical predicates.</Paragraph>
    <Paragraph position="2"> We have studied the discourse structure of a few sublanguages (for both English and Malay), potential candidates for translation domains in our MT system: the  sublanguages of job vacancies, residential properties for sale, cars for sale and education advertisements from different newspapers in English and Malay.</Paragraph>
    <Paragraph position="3"> From our investigations on these sublanguages we have drawn three main conclusions:  1) The stable schemata for English and Malay are not always identical and do not occur equally frequent 2) For some sublanguages there are more than one stable schema 3) For some sublanguages there exists no stable schema  These conclusions are important for MT because in the third case there is no need for discourse transition rules and the translation should be undertaken sentence-bysentence. null THE BIG PROBLEM: IDENTIFICATION OF RHETORICAL PREDICATES During the analysis, rhetorical predicates should be recognized. In certain sublanguages this can be done by means of key words and other clues \[5\]. However, in general this seems to be a very complicated problem and extensive world knowledge and inferencing mechanisms are needed. How could a program recognize a sentence (proposition) as amplification, attributive, etc. rhetorical predicate? For our sublanguage-based MT needs, I am considering two approaches for the identification of rhetorical predicates.</Paragraph>
    <Paragraph position="4"> One approach would be to define &amp;quot;verb frameworks&amp;quot; characteristic of a verb within the sublanguage. Each verb should be associated with possible rhetorical predicates and the predicate should be identified on the basis of the logical structure of the analysis. However, this approach may not be powerful enough in certain cases. Consider the sample text from \[2\] describing Kyushu Daigaku (Kyushu University): &amp;quot;A national, coeducational university in the city of Fukuoka. Founded in 1910 as Kyushu Imperial University. It maintains faculties of letters, education, law, economics, science, medicine, dentistry, pharmacology, engineering, and agriculture. Research institutes include the following: the Research Institute of Balneothempeutics, the Research Institute of Applied Mechanics, the Research Institute of Industry and labor, and the Research Institute of Industrial Science. Enrollment was 9,425 in 1980&amp;quot;.</Paragraph>
    <Paragraph position="5"> It will be quite difficult, however, using only verb framework, to recognize the first, the third, fourth and the last sentence as rhetorical predicates. An useful approach in this case would be to use a domain knowledge which would enable the recognition of the rhetorical predicate after a semantic analysis. For instance a proposition describing entities which are in 'sub-part' relation should be classified as a constituency predicate. This 'sub-part' relation could be easily recognized, provided it has been already described in the domain knowledge base. Consider again the sample text under the assumption that such a knowledge base exists. In this case, from the 'is-a' relation (&amp;quot;Kyushu Daigaku&amp;quot;- &amp;quot;University&amp;quot;), from the respective 'subpart relations' (&amp;quot;university&amp;quot;-&amp;quot;faculty&amp;quot;, &amp;quot;research centre&amp;quot;) and the 'has' relation (&amp;quot;university&amp;quot; -&amp;quot;enrollment of students&amp;quot;), the program could assign to the above sentences identification (1. sentence), constituency (3., 4. sentences) and attributive (last sentence) predicates, respectively.</Paragraph>
    <Paragraph position="6"> Consider, however, the second sentence. Is it &amp;quot;amplification&amp;quot;? If yes, how is the program supposed to conclude that this sentence is an elaboration of the first one? How feasible is in general the computational recognition of the rhetorical  predicates? And here comes an important question: how much domain and world knowledge, as well as AI inferencing techniques, are needed? And if yes, does not it seem that &amp;quot;amplification&amp;quot; is not fine and precise enough (I can give many examples of propositions to which the rhetorical predicate &amp;quot;amplification&amp;quot; is to be assigned, because they simply do not fit the def'mition of the rest of the predicates)? Should not one introduce an additional predicate called e.g. &amp;quot;initiation&amp;quot; which would be associated with the act of founding, setting up, opening, organizing etc. something? This gives a rise to a second important question. Is the set of rhetorical predicates given in \[1\], \[3\], \[8\] or \[9\] sufficient and precise enough to describe the real word? But if we propose additional predicates, how far should we go?</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML