File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1014_metho.xml
Size: 15,612 bytes
Last Modified: 2025-10-06 14:07:09
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1014"> <Title>Reusing an ontology to generate numeral classifiers</Title> <Section position="4" start_page="0" end_page="90" type="metho"> <SectionTitle> 2 Generating Numeral Classifiers </SectionTitle> <Paragraph position="0"> In this section we introduce the properties of numeral classifiers, focusing on Japanese, then give an algorithm to generate classiliers. Japanesc was chosen because of tile wealth of published data on Japanese classifiers and the availability of a large lexicon with semantic classes marked.</Paragraph> <Section position="1" start_page="0" end_page="90" type="sub_section"> <SectionTitle> 2.1 What are Numeral Classifiers </SectionTitle> <Paragraph position="0"> Japanese is a language where most nouns can not be directly modified by numerals, instead, nouns are modified by a numeral-classifier combinatiou as shown in (1).2 2~Vc use lhe following abbreviations: NOM : nominative; ACC = accusative; AI)N = adnominal; CI. = classilier; ARGSTR</Paragraph> </Section> </Section> <Section position="5" start_page="90" end_page="91" type="metho"> <SectionTitle> 2 emails </SectionTitle> <Paragraph position="0"> In Japanese, numeral classifiers arc a subclass o\[ nouns. The main properly dislinguishillg them from t)rolotypical nouns is thai lhey cannot sland alone.</Paragraph> <Paragraph position="1"> Typically they postiix to numerals, forming a quantilter l)hrase. Japanese also allows them to combine with the quantifier st7 &quot;some&quot; or tile interrogative nani &quot;what&quot; (2). We will call all such Colnbinations ot' a numeral/quantifier/interrogative with a numeral classifier a numeral-classitier combination.</Paragraph> <Paragraph position="2"> (2) a. 2-hiki&quot;2 animals&quot; (Numeral) b. sO-hiki &quot;solne animals&quot; (Quantilier) c. nan-biki &quot;how Illauy mfimals&quot; (inlcrrogali ve) Classiliers have different l)rOl)erlies del)ending on their use. There are live ulajor types: serial which classify the kind o1: tile noun phrase tile}, (.\]uanlily (such as -/all &quot;piece&quot;); evenl which arc used to quantify events (such as -kai &quot;lilllC&quot;); meilsllral which ~lre used to measure lhc U.IIIOtlllt Of SOllle property (such its senchi &quot;-cm&quot;), group which refer to a collection of melnbers (such as -inure gloup ), and taxononfic which force (he noun phrase to be inlerpreted as a generic kind (such as -.vim &quot;kind&quot;). We propose the l:ollowing basic struclurc for sortal classiliers (3). The lexical slructtlre we adopt is an extension ot' Pustejovsky's (1995) generative lexicon, with tile addition of an explicit quantilication relationship (Bond and Paik, 1997).</Paragraph> <Paragraph position="4"> There are two variables in the argument strutlure: the numeral, quantifier or interrogative (represented by numera2+), and the noun phrase being classilied. Because the noun phrase being classilied can be omitted in context, it is a default argun-lent, one which participales in tile logical expressions in the qualia, but is not necessarily expressed syntactically. null = argunlelll slructurc; AR(; = argument; \[)-AR(; = default m-gumenl, QUANT = quantilication.</Paragraph> <Paragraph position="5"> Serial classiliers differ from each other in tile restrictions they place on the quantilied variable 7V.</Paragraph> <Paragraph position="6"> For example tile classilier -nin adds tile restriction y:human. That is, it can only be used to classify human referents.</Paragraph> <Paragraph position="7"> Japanese has two number systems: a Sine-Japanese one based on Chinese for example, ichi &quot;Olle&quot;,lli &quot;\[wo&quot;,s(lll &quot;lhree&quot;, etc., and ~tll alternative nalive-Jal)anesc system, for example, hitotsu &quot;one&quot; fitlalsu &quot;two&quot;,milsu &quot;three&quot;, etc. In Japanese tile llalive system only exists for the numbers from one to ten. Most classitiers combine with the Chinese lorms, howevm; different classiliers select Sine-Japanese for some numerals, for example, ni-hiki &quot;two-el&quot;, and most classifiers undergo some form of sound change (such as -hiki to -biki in (2)).</Paragraph> <Paragraph position="8"> Wc will not bc concerned wilh these morllhological changes, we refer interested reMers to Backhouse (1993, I 1 g-122) for more discussion.</Paragraph> <Paragraph position="9"> Numeral classiliers characteristically premodify the noun phrases they quantify, linked by an adhereinal case marker, as in (4); or appear 't\]oating' as adverbial phrases, lypically to before the verb: (5).</Paragraph> <Paragraph position="10"> The choice between pre-nominal and lloming quanlifters is hu'gcly driven by discourse related considerations (1)owning, 1996). In this paper we concenlrale on (he semantic contribution of the quantiliers, and ignore tile discourse effects.</Paragraph> <Paragraph position="11"> Quantilier phrases can also function as noun l)hrascs on their own, with anaphoric or deictic reference, when what is being quantilied is recover- null able from the context. For example (7) is acceptable if the letters have already been referred to, or arc clearly visible.</Paragraph> <Paragraph position="12"> (6) \[some background with letters salient\] (7) 2-tsfi-o yonda (Japanese) 2-CI,-ACC read 1 read two letters In the pre-nonlinal construction tile relation between ihe target noun phrase and quantilier is explicit. For muneral-classilier combinations the quantification can be of the object denoted by the noun phrase itself as in (8); or of a sub-part of it as in (9) (see Bond and Pail (1997) for a fuller discussion). null (8) 3-tsfi-no tegami 3-CL-ADN letter 3 letters (9) 3-mai-no tegami 3-CL-ADN letter a 3 page letter</Paragraph> <Section position="1" start_page="91" end_page="91" type="sub_section"> <SectionTitle> 2.2 An Algorithm to Generate Numeral Classifiers </SectionTitle> <Paragraph position="0"> The only published algorithm to generate classifiers is that of Sornlertlamvanich et al. (1994). They propose to generate classifiers in Thai as follows: First create a lexicon with default classifiers listed for as many nouns as possible. This was done by automatically extracting noun classifier pairs from a sense-tagged corpus, and taking the classifier that appeared most often with each sense of a noun. 3 Then, the most fiequent classifier is listed for each semantic class. Generation is then simple: if a noun has a default classifier in the lexicon, then use it, otherwise use the default classifier associated with its semantic class.</Paragraph> <Paragraph position="1"> Unfortunately, no detailed results were given as to the size of the concept hierarchy, the number of nodes in it or the number of nouns for which classifiers were found. As the generation procedure was not ilnplemented, there was no overall accuracy given for the system.</Paragraph> <Paragraph position="2"> As a default, Sornlertlamvanich et al.'s algorithm is useful. However, it does not cover several exceptional cases, so we have refined it further. The extended algorithm is shown in Figure 1.</Paragraph> <Paragraph position="3"> Firstly, we have made explicit what to do when a noun is a member of more than one semantic class or of no semantic class. In the lexicon we used, nouns are, on average, inembers of 2 semantic classes. Howevm; the semantic classes are ordered so that the most typical use comes first. For example, usagi &quot;rabbit&quot; is marked as both animal and meat, with animal coming first (Fignre 3).</Paragraph> <Paragraph position="4"> In this case, we would take the classifier associated 3111 fact, Thai also has a great many group classiliers, much like heM, flock and pack in English. Therefore each noun has tWO classifiers, a sortal classifier and a group classifier listed. Japanese does not, so we will not discuss the generation of group classiliers here.</Paragraph> <Paragraph position="5"> with the first semantic class. However, in the case of usagi it is not counted with the default classifier for animals -hiki, but with that for birds -wa, this must be listed as an exception.</Paragraph> <Paragraph position="6"> Secondly, we have added a method for generating classifiers that quantify coordinate noun phrases.</Paragraph> <Paragraph position="7"> These commonly appear in appositive noun phrases such as ABC-to XYC-no 2-sha &quot;the two companies, ABC and XYZ&quot;.</Paragraph> <Paragraph position="8"> 1. For a simple noun phrase (a) If the head noun has a default classifier in the lexicon: use the noun's default classifier (b) Else if it exists, use the defimlt classifier of the head noun's first listed semantic class (the class's default classifier) (c) Else use the residual classifier -tsu 2. For a coordinate noun phrase generate the classifier for each noun phrase use the most frequent classifier In addition, we investigate to what degree we could use inheritance to remove redundancy from the lexicon, ff a noun's default classifier is the same as the default classifier for its semantic class, then there is no need to list it in the lexicon. This makes the lexicon smaller and it is easier to add new entries. Any display of the lexical item (such as for maintenance or if the lexicon is used as a human aid), should automatically generate the classifier from the semantic class. Alternatively (and equivalently), in a lexicon with multiple inheritance and defaults, the class's default classifier can be added as a defeasible constraint on all lnembers of the semantic class.</Paragraph> </Section> </Section> <Section position="6" start_page="91" end_page="92" type="metho"> <SectionTitle> 3 The Goi-Taikei Ontology </SectionTitle> <Paragraph position="0"> We used tim ontology provided by Goi-Taikei -- A Japanese Lexicon (Ikehara et al., 1997). We choose it because of its rich ontology, its extensive use in many other NLP applications, its wide coverage of Japanese, and tile fact that it is being extended to other numeral classifier languages, such as Malay.</Paragraph> <Paragraph position="1"> The ontology has several hierarchies of concepts: with both is-a and has-a rehttionshil)s. 2,710 semantic classes (12-level lt'ee structure) for common nouns, 200 chtsses (9-level tree structure) for proper nouns and 108 classes for predicates. We show the top three levels of the common norm ontology in Figure 2. Words can be assigned to semantic classes anywhere in the hierarchy. Not all semantic classes have words assigned to them.</Paragraph> <Paragraph position="2"> The semantic classes are used in the Jalmnese word semantic dictionary to classify nouns, verbs and adjectives. The dictionary inchtdes 100,000 common nouns, 70,000 technical terms, 200,000 proper nouns and 30,000 other words: 400,000 words in all. The semantic classes al'e also used as selectional restrictions on the arguments o1' predicates in a separate predicate dictionary, with around 17,000 entries.</Paragraph> <Paragraph position="3"> Figure 3 shows an example of one record of the Japanese semantic word dictionary, with the addition of the new I)I{FAU1\]I&quot; CLASSIFIFA{ lield (underlined for elnphasis).</Paragraph> <Paragraph position="4"> Each record has an index form, pronunciation, a canonical form, part-of-speech and semantic classes. Each word can have up to five common iloun classes and ten proper noun chtsses, hi the case of usagi &quot;rabbit&quot;, there are two common noun classes and no proper noun classes.</Paragraph> </Section> <Section position="7" start_page="92" end_page="92" type="metho"> <SectionTitle> 4 Maplfing Classiliers to the Ontology </SectionTitle> <Paragraph position="0"> In this section we investigate how l'ar the semantic classes can be used to predict default classiticrs for nouns. Because most sortal classifiers select for some kind of semantic class, we thought that nouns grouped together under the same senmntic class should share the same classifier.</Paragraph> <Paragraph position="1"> We associated classifiers with semantic classes by hand. This took around two weeks. We found that, while some classes were covered by a single classifier, around 20% required more than one. For example, 1056:song is counted only by -kyoku &quot;tune&quot;, and 989 :waker vehicZe by only by seki &quot;ship&quot;, but the class \[961:weapon\] had menlbet's counted by -hen &quot;long thin&quot;, -chO &quot;knife&quot;, -.fitri &quot;swords&quot;, -ki &quot;machines&quot; and more.</Paragraph> <Paragraph position="2"> We show the most flequeut numeral classifiers in Table 1. We ended up with 47 classifiers used as semantic classes' default classifiers. This is in line with the fact that most speakers of Japanese know and use between 30 and 80 sortal classifiers (l)owning, 1996). Of course, we expect to add more classifters at the noun level.</Paragraph> <Paragraph position="3"> 801 semantic classes turned out not to have classiliers. This included chtsses with no words associated with them, and those that only contained nouns with referents so abstract we considered them to be uncountable, such as greed, lethargy, etc.</Paragraph> <Paragraph position="4"> We used the default chtssifiers assigned to the semantic classes to generate defeasible del'aults for the noun entries in the common and technical term dictionaries (172,506 words in all). We did this in order to look at the distribution of classifiers over words in the lexicon. In the actual generation this would be done dynamically, after the semantic classes have been disambiguated. The distributions of classifiers were similar to those of the semantic classes, although there was a higher proportion counted with the residual classilier -tsu, and the classifier for machines -ekti. This may be an artifact of the 70,000 word technical term dictionary. As further research, wc would like to calculate the distribution of classi\[iers in some text, althottgh we expect it to depend greatly on the genre.</Paragraph> <Paragraph position="5"> The mapping we created is not complete because some of the semantic classes have nouns which do not share the same classifiers. We have to add lnore specific defaults at the noun level. As well as more specific sortal classifiers, there are cases where a group classifier may be more appropriate. For example, among the nouns counted with -~zi~ there are entries such as couple, twins and so on which are often counted with -kumi &quot;pair&quot;.</Paragraph> <Paragraph position="6"> In addition, the choice o1' classilier can depend on factors other than just semantic class, for example, hire &quot;people&quot; can be counted by either -nin or -mei, the only difference being that -mei is more polite.</Paragraph> <Paragraph position="7"> it was difficult to assign default classifiers to the semantic classes that referred to events. These chtsses mainly include deverbal nouns (e.g. konomi &quot;liking&quot;) and nominal verbs (e.g., benkyO &quot;study&quot;). These can stand for both the action or the result of the action: e.g. kenkyl7 &quot;a study/research&quot;. In these cases, every application we considered would distinguish between event and sortal classification in the input, so it was only necessary to choose a classifier for the result of the action.</Paragraph> </Section> class="xml-element"></Paper>