File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1002_metho.xml

Size: 15,429 bytes

Last Modified: 2025-10-06 14:13:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1002">
  <Title>Countability and Number in Japanese to English Machine Translation</Title>
  <Section position="4" start_page="33" end_page="33" type="metho">
    <SectionTitle>
PLURALIA TANTUM PLURAL pair
PLURALIA TANTUM PLURAL --
</SectionTitle>
    <Paragraph position="0"> Examples of the information about countability and number stored in the Japanese to English noun transfer dictionary are given in table 1. The information about noun countability preferences cannot Joe found in standard dictionaries and must Ix: entered by an English native speaker. Some tests to help determine a given noun's countability preferences are described in Bond and Ogura (1993), which discusses the use of noun countability preferences in Japanese to English machine translation.</Paragraph>
  </Section>
  <Section position="5" start_page="33" end_page="34" type="metho">
    <SectionTitle>
3 Determination of NP Ref-
</SectionTitle>
    <Paragraph position="0"> erentiality The first stage in generating the countability and number of a translated English noun phrase is to determine its referentiality. We distinguish three kinds of referentiality: 'generic', 'referential' and 'ascriptive'.</Paragraph>
    <Paragraph position="1"> We call noun phrases used to make general statements about a class generic; for example Mammoths are extinct. The way generic noun phrases are expressed in English is described in Section 3.1. Referential noun phrases are ones that refer to some specific referent; for example Two dogs chase .a cat. Their number and countability are ideally determined by the properties of the referent. Ascriptive noun phrases are used to ascribe a property to something; for example Ilathi is an elephant. They normally have the same number and countability as the noun phrase whose property they are describing.</Paragraph>
    <Paragraph position="2">  2.</Paragraph>
    <Paragraph position="3"> 3.</Paragraph>
    <Paragraph position="4">  if restrictively modilied then 'referential' my book, the man who came to dinner if subject of extinct, evolve ... 'generic' Mammoths are extinct if the semantic category of the subject of a copula is a daughter of the semantic category of the object then 'generic' Mammoths are animals at, Jbr ...then 4. if modified hy aimed  ' generic' ' A magazine for women 5. if object of like.., then 'generic' I like cake 6. if complement of a copula then 'ascriptive' N77' is a telephone company 7. if appositive then 'ascriptive' NT/, a telephone company ...</Paragraph>
    <Paragraph position="5"> 8. default 'referential'  The process of determining the referentiality of a noun phrase is shown in Figure 1. The tests are processed in the order shown. As far as possible, simple criteria that can be implemented using the dictionary have been chosen. For example, Test 4&amp;quot; if a NP is modilied by aimed at,for.., then it is 'generic'&amp;quot; is  applied as part of translating NPl-muke into &amp;quot;for NPI&amp;quot;. The transfer dictionary includes the information that in this case, NPI should be generic.</Paragraph>
    <Paragraph position="6"> &amp;quot;li~sts 2 a3 show two more heuristic methods for determining whether a noun phrase has generic reference. In Test 2, if the predicate is marked in the dictionary as one that only applies to classes as a whole, such as evolve or be extinct, then the sentence is taken to be generic. In &amp;quot;lest 3, AUI',I/I,:'s semantic hierarchy is used to test whether a sentence is generic or not. For example in Mamnloths are animals, mammoth has the semantic category ANIMAL so the sentence is judged to be stating a fact true of all mannnoths and is thus generic.</Paragraph>
    <Section position="1" start_page="34" end_page="34" type="sub_section">
      <SectionTitle>
3.1 Generic iioun phrases
</SectionTitle>
      <Paragraph position="0"> A generic noun phrase (with a countable head noun) can generally be expressed in three ways (Huddleston 1984). We call these GEN 'a', where the noun phrase is indefinite: A mammoth is a mammal; GEN 'the', where the noun phrase is definite: The mammoth is a mammal; and GEN ~/5, where there is no article: Mammoths are mammals. Uncountable nouns and pluralia tanta can only he expressed by GEN &lt;/~ (eg: Furniture is expensive). They cannot take GEN 'a' because they cannot be moditied by a. They do not take GEN 'the', because then the noun phrase woukl normally be interpreted as having detinite reference.</Paragraph>
      <Paragraph position="1"> Nouns that can be either countable or uncountable also only take GEN (,: Cake is delicious/Cakes are delicious. These combinations are shown in Table 2, noun phrases that can not be used to show generic reference  The use all three kinds of generic noun phrases is not acceptable in some cofltexts, for example * a mammoth evolved. Sometimes a noun phrase can be ambiguous, for example 1 like the elel~hant, where the speaker could like a particular elephant, or all elephants.</Paragraph>
      <Paragraph position="2"> Because the use of GEN (~ is acceptable in all Contexts, AI,T-J/E generates all generic noun phrases as such, that is as bare noun phrases. The number of the noun phrase is then determined by the countability preference of the noun phrase heading it. Fully countable nouns and pluralia tanta will be plural, all others are singular.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="34" end_page="36" type="metho">
    <SectionTitle>
4 Determination of NP
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="34" end_page="35" type="sub_section">
      <SectionTitle>
Countability and Number
</SectionTitle>
      <Paragraph position="0"> The following discussion deals only with referential and ascriptive noun phrases as generic noun phrases were discussed in Section 3.1, The delinilious of noun phrase countability given in Section 2, while useful for analyzing English, are not suf\[icient for translating fl'om Japanese to English. This is hecause in many cases it is impossible to tell fl'om the Japanese form or syntactic shape whether a translated noun phrase will fall within the scope of a denumerator or not. Japanese has no equivalent to a/an and does not distinguish between countable and uncountable quantifiers such as many/much and little/J'ew. Therefore to deternfine countability and generate number we need to use a combination of iuformatiou from the Japanese original sentence, and de\['ault itflBrmation from the Japanese to English Iransfer dictionary. As much as possible, detailed intbrmation is entered in the lransier dictionaries to allow the translation process itself to be made simple.</Paragraph>
      <Paragraph position="1"> The process of determining a noun phrase's countability and ntnnl)er is shown in Figure 2.</Paragraph>
      <Paragraph position="2"> The process is carried out during the transfer stage so information is available from both the .lapanese original and the selected English translation.</Paragraph>
      <Paragraph position="3"> To make the task of determining countabil~ ity and number simpler,' we deline combinations of dil'fereut countabilities for nouns with  if the Japanese is explicitly plural then countable and plural determine according to determiner one dog, all dogs determine according to classifier a slice of cake, a pile of cakes determine according to complement schools all over the country aseriptive NPs match their subjects A computer is a piece of equipment  6. determine according to verb I gather flowers 7. use default value (a) uncountable, weakly countable become: null uncountable and singular (b) pluralia tanta become: countable and plural (c) countable and strongly countable become: null countable and singular or plural according to the dictionary default</Paragraph>
    </Section>
    <Section position="2" start_page="35" end_page="36" type="sub_section">
      <SectionTitle>
Countability mid Number
</SectionTitle>
      <Paragraph position="0"> different countability preferences that we can use in the dictionaries. The effects of the four most common types on the five major noun countability preferences are shown in Table 3.</Paragraph>
      <Paragraph position="1"> Noun phrases modi fled by Japanese/English pairs that are translated as denumerators we call denumerated. For example a noun modified by onoono-no &amp;quot;each&amp;quot; is denumerated singular, while one modilied by ryouhou-no &amp;quot;both&amp;quot; is denumerated - plural. Uncountable and pluralia tantum nouns in denumerated environments are translated as the prepositional complement of a classifier. A default classilier is stored stored in the dictionary for uncountable nouns and pluralia tanta. Ascriptive noun phrases whose subject is countable will also be denumerated.</Paragraph>
      <Paragraph position="2"> The two 'mass '3 environments shown in Table 3 are used to show the countability of nouns that can be either countable or uncountable.</Paragraph>
      <Paragraph position="3"> Weakly countable nouns will only be countable if used with a denumerator. Strongly countable nouns will be countable and plural in such mass - countable environments as tim object of collect (vt): I collect cakes, and uncountable and singular in mass -uncountable enviromnenls such as I ate too much cake.</Paragraph>
      <Paragraph position="4"> In fact, both I collect cake and I ate too many cakes are possible. As Japanese does not distinguish between the two the system must make the best choice it can, in the same way a human translator would have to. The rules have been implemented to generate the translation that has the widest application, for example generating I ale too much cake, which is true whether the speaker only ate part or all  of one cake or if they ate many cakes, rather than I ate too many cakes which is only true if the speaker ate many cakes, Sometimes the choice of the English translation of a modifier will depend on the countability of the noun phrase. For example, kazukazu-no and takusan-no can all be translated as &amp;quot;many&amp;quot;. kazukazu-no implies that it's modificant is made up of discrete entities, so the noun phrase it modifies should be translated as denumerated - plural, takusan-no does not carry this nuance so ALT-J/E will translate a noun phrase modified by it as mass - uncountable, and takusan-no as many il' the head is countable and much otherwise.</Paragraph>
      <Paragraph position="5"> Rules that translate the nouns with different noun countability preferences into other combinations of countable and uncountable are also possible. For example, sometimes even fully countable nouns can be used ira uncountable noun phrases. If an elephant is referred to not as an individual elephant but its a source of meat, tben it will be expressed in an uncountable noun phrase: I ale a slice qf elephant. To generate this the following rule is used: &amp;quot;nouns quantilied with the classilier kire &amp;quot;slice&amp;quot; will be generated as tile prepositional complement of slice, they will be singular with no ,article unless they are pluralia tanta, when they will be plural with no article&amp;quot;.</Paragraph>
      <Paragraph position="6"> Note that countable indefinite singular noun phrases without a determiner will have a/an generated. Countable indelinite plural noun phrases and uncountable noun phrases may have some generated; a full discussion of this is outside the scope of this article.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="36" end_page="38" type="metho">
    <SectionTitle>
5 Experimental Results
</SectionTitle>
    <Paragraph position="0"> Tiffs processing described above has been implemented in ALT-J/E. It was tested, together with new processing to generate articles, on a specially constructed set of test sentences, and on a collection of newspaper articles. The results are summarized in Table 4.</Paragraph>
    <Paragraph position="1"> In the newspaper articles tested, there were an average of 7.0 noun phrases in each sentence. For a sentence to be judged as correct  y~_ 46deg/,, 165% 5% all the uoun phrases nmst be correct. The introduction of the proposed method improved the percentage of correct sentences from 5% to 12%.</Paragraph>
    <Paragraph position="2"> Some examples of translations before and after tile introduction of the new processing are given below. The translations before the proposed processing was implemented are marked O1.D, the translations produced by AI;I'-J/I,; using the proposed t)rocessing are marked NEW.</Paragraph>
    <Paragraph position="3">  OI.D: &amp;quot;Most children become an adult&amp;quot; NEW: &amp;quot;Most children become adults&amp;quot; In (1), the noun phrase beaded by otona &amp;quot;adult&amp;quot; is judged to be prescriptive, as it is tile complement of tile coptflar narlt &amp;quot;become&amp;quot;. Therefore the proposed method translates it with the same number as the subject.</Paragraph>
    <Paragraph position="4"> (2) manmo.~u-ha zetumetu-shita mammoth died-out OLI): &amp;quot;A mammoth died out&amp;quot; NEW: &amp;quot;Manamoths died out&amp;quot; zetumettt &amp;quot;die out&amp;quot;, is entered in the lexicon as a verb whose subject must be generic. manmosu &amp;quot;mammoth&amp;quot; is fully countable so the gene,ic noun phrase is translated as a bare plural.</Paragraph>
    <Paragraph position="5">  OLD: &amp;quot;There are 3 piece tofu,</Paragraph>
  </Section>
  <Section position="8" start_page="38" end_page="38" type="metho">
    <SectionTitle>
1 scissors,
</SectionTitle>
    <Paragraph position="0"> and 2 knives&amp;quot; NEW: &amp;quot;There are 3 pieces oftofu,</Paragraph>
  </Section>
  <Section position="9" start_page="38" end_page="38" type="metho">
    <SectionTitle>
1 pair
</SectionTitle>
    <Paragraph position="0"> of scissors and 2 knives&amp;quot; The old version recognizes that a denumerated noun phrase headed by ,an uncountable noun tofu requires a classifier but does not generate the correct structure neither does it generate a classifier for the pluralia tanta scissors. The version using the proposed method does.</Paragraph>
    <Paragraph position="1"> (4) sore-ha dougu da that equipment is OLD: &amp;quot;That is equipment&amp;quot; NEW: &amp;quot;That is a piece of equipment&amp;quot; As the subject o f the copula that is countable it's complement is judged to be denumerated by the proposed method. As the complement is headed by an uncountable noun it must be embedded in the prepositional complement of a classifier.</Paragraph>
    <Paragraph position="2"> There are three main problems still remaining. The first is that currently the rules for determining the noun phrase referentiality are insufficiently fine. We estimate that if referentiality could be determined 100% correctly then the percentage of noun phrases with correctly generated articles and number could be improved to 96% in the test set we studied. The remaining 4% require knowledge from outside the sentence being translated.</Paragraph>
    <Paragraph position="3"> The biggest problem is noun phrases requiring world knowledge that cannot be expressed as a dictionary default. These noun phrases cannot be generated correctly by the purely heuristic methods proposed here. The last problem is noun phrases whose countability and number can be deduced flom information in other sentences. We would like to extend our method to use this information in the future.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML