File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-3004_metho.xml
Size: 8,528 bytes
Last Modified: 2025-10-06 14:09:37
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-3004"> <Title>Chinese Classifier Assignment Using SVMs</Title> <Section position="4" start_page="25" end_page="26" type="metho"> <SectionTitle> 3 Support Vector Machines </SectionTitle> <Paragraph position="0"> Support Vector Machines (SVMs) are a type of classifier first introduced in (Boser et al., 1992).</Paragraph> <Paragraph position="1"> In the last few years SVMs have become an important and active field in machine learning research. The SVM algorithm detects and exploits complex patterns in data.</Paragraph> <Paragraph position="2"> A binary SVM is a maximum margin classifier.</Paragraph> <Paragraph position="3"> Given a set of training data {x1,x2,...,xk}, with corresponding labels y1,y2,...,yk [?] {+1,[?]1}, a binary SVM divides the input space into two regions at a decision boundary, which is a separating hyperplane <w,x> + b = 0 (Figure 1). The decision boundary should classify all points correctly, that is:</Paragraph> <Paragraph position="5"> Also, the decision boundary should have the maximum separating margin with respect to the two classes. If we rescale w and b to make the closest point(s) to the hyperplane satisfy</Paragraph> <Paragraph position="7"/> <Paragraph position="9"> and the problem can be formulated as: minimize 12||w||2 subject to yi(<w,xi> + b) [?] 1,[?]i The generalized Lagrange Function is:</Paragraph> <Paragraph position="11"> This is a quadratic programming (QP) problem and we can always find the global maximum of ai. We can recover w and b for the hyperplane by:</Paragraph> <Paragraph position="13"> If the points in the input space are not linearly separable, we allow 'slack variables' xi in the classification. We need to find a soft margin hyperplane, e.g.:</Paragraph> <Paragraph position="15"> subject to yi(<w,xi> + b) [?] 1 [?]xi,[?]i Once again, a QP solver can be used to find the solution.</Paragraph> <Paragraph position="16"> For our task we need multi-class SVMs. To get multi-class SVMs, we can construct and combine several binary SVMs (one-against-one), or we can directly consider all data in one optimization formula (one-against-all).</Paragraph> <Paragraph position="17"> Many SVM implementations are available on the web. We chose LIBSVM (Chang and Lin, 2001), which is an efficient multi-class implementation. LIBSVM uses the &quot;one-against-one&quot; approach in which k(k[?]1)/2 classifiers are constructed and each one trains on data from two different classes (Hsu and Lin, 2002).</Paragraph> </Section> <Section position="5" start_page="26" end_page="26" type="metho"> <SectionTitle> 4 Data and Resources </SectionTitle> <Paragraph position="0"> We use the Penn Chinese Treebank (Xue et al., 2002) as our corpus and the ontology/lexicon HowNet (Dong and Dong, 2000) to get ontological features for nouns. We train SVMs on different feature sets to see which set(s) of features are important for noun-classifier matching.</Paragraph> <Section position="1" start_page="26" end_page="26" type="sub_section"> <SectionTitle> 4.1 Penn Chinese Treebank </SectionTitle> <Paragraph position="0"> The Penn Chinese Treebank is a 500,000 word Chinese corpus annotated with both part-of-speech (POS) tags and syntactic brackets.</Paragraph> <Paragraph position="1"> We automatically extract noun phrases that contain classifiers from the corpus. An example noun phrase (translation: 'a major commercial</Paragraph> <Paragraph position="3"> The word in (CLP (M_d_3403[tiao])) is the classifier and the head noun of the noun phrase is (NN _d_1452 _d_5471). In Section 5.3 we describe a set of features we obtain from each noun phrase and the sentence in which it is embedded.</Paragraph> <Paragraph position="4"> In our corpus, there are 61587 noun occurrences (12225 unique nouns) and 3940 classifiernoun co-occurrences (212 unique classifiers). However, there is a trival rule determining whether a noun needs a classifier. If a noun is preceded by a quantifier or a determiner, then a classifier is needed, otherwise it is not. Hence, we only focus on noun-classifier pairs. The most frequently occurring classifier in this corpus is '_d_977[ge]', which occurs with 497 unique nouns. In this corpus, 87 classifiers occur in only one noun-classifier pair.</Paragraph> </Section> <Section position="2" start_page="26" end_page="26" type="sub_section"> <SectionTitle> 4.2 HowNet </SectionTitle> <Paragraph position="0"> We get ontological features of nouns from HowNet. HowNet is a bilingual Chinese-English lexicon and ontology. Each word sense is assigned to a concept containing ontological features. HowNet uses basic meaning units named sememes to construct concepts.</Paragraph> <Paragraph position="1"> Table 1 shows an example entry in HowNet.</Paragraph> <Paragraph position="2"> The entry in Table 1 is for the word '_d_1144 _d_2319'(writer). The sememe at the first position, 'human(_d_1054)', is the categorical attribute, which describes the general category of the concept. The sememes following the first sememe are additional attributes, which give additional specific features. There are two types of pointer, '#' and '*', in the definition. '#' means 'related', so '#occupation' shows that the concept has a relationship with 'occupation'. '*' means 'agent', so '*compile' shows that 'writer' is the agent of 'compile'. The sememes '#readings' and 'literature' show that the job of 'writer' is to compile 'readings' about 'literature'.</Paragraph> <Paragraph position="3"> We use HowNet 2000, which contains 120,496 entries for about 65,000 Chinese words defined with a set of 1503 sememes. It is big enough for our task and we can get ontological features for 94.71% of the nouns from the Penn Chinese Treebank. For the nouns that are not in HowNet, we just leave the ontological features blank.</Paragraph> </Section> </Section> <Section position="6" start_page="26" end_page="27" type="metho"> <SectionTitle> 5 Experiments </SectionTitle> <Paragraph position="0"> We use six different feature sets to assign classifiers to nouns. To evaluate each feature set, we perform 10-fold cross validation. We report our results in Section 6.</Paragraph> <Section position="1" start_page="26" end_page="27" type="sub_section"> <SectionTitle> 5.1 Baseline Algorithm </SectionTitle> <Paragraph position="0"> In the training data, we count the number of times each classifier appears with a given noun. We assign to each noun in the testing data its most fre- null quently co-occurring classifier (c.f. (Sornlertlamvanich et al., 1994)). If a noun does not appear in the training data, we assign the classifier '_d_977[ge]', the classifier which appears most frequently over-all in the corpus.</Paragraph> </Section> <Section position="2" start_page="27" end_page="27" type="sub_section"> <SectionTitle> 5.2 Noun Features </SectionTitle> <Paragraph position="0"> Since classifiers are assigned mostly based on the noun, the most important features for classifier prediction should be features of the nouns. We ran four different experiments for noun features: * (1) The feature set includes only the noun itself. null * (2) The feature set includes ontological features of the noun only. If classifiers are associated with semantic categories (c.f. (Paik and Bond, 2001)), we should be able to assign classifiers based on the ontological features of nouns.</Paragraph> <Paragraph position="1"> * (3) The feature set includes the noun and ontological features.</Paragraph> <Paragraph position="2"> * (4) Two SVMs are trained: one on the noun only, and one on ontological features only. During testing, nouns in the training set are assigned classifiers using the first SVM; other nouns are assigned classifiers using the second SVM.</Paragraph> </Section> <Section position="3" start_page="27" end_page="27" type="sub_section"> <SectionTitle> 5.3 Context Features </SectionTitle> <Paragraph position="0"> In this set of experiments, we used features from both the noun and the context. The features we used can be categorized into two groups: lexical features and syntactic features. They are shown in Table 2.</Paragraph> <Paragraph position="1"> We ran two experiments using this set of features: null * (5) The feature set includes the noun, lexical and syntactic features only.</Paragraph> <Paragraph position="2"> * (6) The feature set includes the noun, lexical, syntactic and ontological features.</Paragraph> </Section> </Section> class="xml-element"></Paper>