File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/05/i05-3004_relat.xml
Size: 2,262 bytes
Last Modified: 2025-10-06 14:15:53
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-3004"> <Title>Chinese Classifier Assignment Using SVMs</Title> <Section position="3" start_page="0" end_page="25" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> Many Asian languages (e.g. Chinese, Korean, Japanese and Thai) have numeral classifier systems. Previous work on noun-classifier matching has been done in these languages. (Sornlertlamvanich et al., 1994) present an algorithm for selecting an appropriate classifier for a noun in Thai. The general idea is to extract noun-classifier collocations from a corpus, and output a list of noun-classifier pairs with frequency information. During noun phrase generation, the most frequently co-occurring classifier for a given noun is selected. However, no evaluation is reported for this algorithm.</Paragraph> <Paragraph position="1"> The algorithm described in (Paik and Bond, 2001) generates Japanese and Korean numeral classifiers using semantic classes from an ontology. The authors assigned classifiers to each of the 2,710 semantic classes in the ontology by hand. During generation, nouns in each semantic class are assigned the associated classifier. The classifier assignment accuracy is 81% for Japanese classifiers and 62% for Korean classifiers. However, the evaluation set contains only 90 noun phrases, which is pretty small. Furthermore, it is hard work to attach classifiers to an ontology by hand, and with this approach it is hard to deal with cases like the cattle example mentioned earlier.</Paragraph> <Paragraph position="2"> (Paul et al., 2002) present a method for extracting classifier information from a bilingual (Japanese-English) corpus based on phrasal correspondences in the sentential context. Bilingual sentence pairs are compared to find noun-classifier collocations. The evaluation was done by a human. The precision is high (84.2%) but the recall is only about 40% because the algorithm does not give output for half of the nouns.</Paragraph> <Paragraph position="3"> In contrast to these algorithms, our approach: is based on a large data set; uses machine learning; and does not require the attachment of classifiers to an ontology by hand.</Paragraph> </Section> class="xml-element"></Paper>