File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2170_intro.xml
Size: 11,712 bytes
Last Modified: 2025-10-06 14:06:37
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2170"> <Title>A Procedure for Multi-Class Discrimination and some Linguistic Applications</Title> <Section position="3" start_page="0" end_page="1034" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A common task of knowledge discovery is multiple concept learning, in which from multiple given classes (i.e. a typology) the profiles of these classes are inferred, such that every class is contrasted from every other class by feature values. Ideally, good profiles, besides making good predictions on future instances, should be concise, intelligible, and comprehensive (i.e. yielding all alternatives).</Paragraph> <Paragraph position="1"> Previous approaches like ID3 (Quinlan, 1983) or C4.5 (Quinlan, 1993), which use variations on greedy search, i.e. localized best-next-step search (typically based on information-gain heuristics), have as their major goal prediction on unseen instances, and therefore do not have as an explicit concern the conciseness, intelligibility, and comprehensiveness of the output. In contrast to virtually all previous approaches to multi-class discrimination, the MPD</Paragraph> <Section position="1" start_page="0" end_page="1034" type="sub_section"> <SectionTitle> (Maximally Parsimonious Discrimination) program </SectionTitle> <Paragraph position="0"> we describe here aims at the legibility of the resultant class profiles. To do so, it (1) uses a minimal number of features by carrying out a global optimization, rather than heuristic greedy search; (2) produces conjunctive, or nearly conjunctive, profiles for the sake of intelligibility; and (3) gives all alternative solutions. The first goal stems from the familiar requirement that classes be distinguished by jointly necessary and sufficient descriptions. The second accords with the also familiar thesis that conjunctive descriptions are more comprehensible (they are the norm for typological classification (Hempel, 1965), and they are more readily acquired by experimental subjects than disjunctive ones (Bruner et. al., 1956)), and the third expresses the usefulness, for a diversity of reasons, of having all alternatives. Linguists would generally subscribe to all three requirements, hence the need for a computational tool with such focus3 In this paper, we briefly describe the MPD system (details may be found in Valdrs-P@rez and Pericliev, 1997; submitted) and focus on some linguistic applications, including componential analysis of kinship terms, distinctive feature analysis in phonology, language typology, and discrimination of aphasic syndromes from coded texts in the CHILDES database.</Paragraph> <Paragraph position="1"> For further interesting application areas of similar algorithms, cf. Daelemans et. al., 1996 and Tanaka, 1996.</Paragraph> <Paragraph position="2"> 2 Overview of the MPD program The Maximally Parsimonious Discrimination program (MPD) is a general computational tool for inferring, given multiple classes (or, a typology), with attendant instances of these classes, the profiles (=descriptions) of these classes such that every class is contrasted from all remaining classes on the basis of feature values. Below is a brief description of the program.</Paragraph> </Section> <Section position="2" start_page="1034" end_page="1034" type="sub_section"> <SectionTitle> 2.1 Expressing contrasts </SectionTitle> <Paragraph position="0"> The MPD program uses Boolean, nominal and numeric features to express contrasts, as follows: ~The profiling of multiple types, in actual fact, is a generic task of knowledge discovery, and the program we describe has found substantial applications in areas outside of linguistics, as e.g., in criminology, audiology, and datasets from the UC Irvine repository. However, we shall not discuss these applications here.</Paragraph> <Paragraph position="1"> * Two classes C1 and C2 are contrasted by a Boolean or nominal feature if the instances of C1 and the instances of C2 do not share a value.</Paragraph> <Paragraph position="2"> * Two classes C1 and C2 are contrasted by a numeric feature if the ranges of the instances of C1 and of C2 do not overlap. 2 MPD distinguishes two types of contrasts: (1) ab.</Paragraph> <Paragraph position="3"> solute contrasts when all the classes can be cleanly distinguished, and (2) partial contrasts when no absolute contrasts are possible between some pairwise classes, but absolute contrasts can nevertheless be achieved by deleting up to N per cent of the instances, where N is specified by the user.</Paragraph> <Paragraph position="4"> The program can also invent derived features--in the case when no successful (absolute) contrasts are so far achieved--the key idea of which is to express interactions between the given primitive features.</Paragraph> <Paragraph position="5"> Currently we have implemented inventing novel derived features via combining two primitive features (combining three or more primitive features is also possible, but has not so far been done owing to the likelihood of a combinatorial explosion): * Two Boolean features P and Q are combined into a set of two-place functions, none of which is reducible to a one-place function or to the negation of another two-place function in the set. The resulting set consists of P-and-Q, Por-Q, P-iff-Q, P-implies-Q, and Q-implies-P.</Paragraph> <Paragraph position="6"> * Two nominal features M and N are combined into a single two-place nominal function MxN.</Paragraph> <Paragraph position="7"> * Two numeric features X and Y are combined by forming their product and their quotient. 3 Both primitive and derived features are treated analogously in deciding whether two classes are contrasted by a feature, since derived features are legitimate Boolean, nominal or numeric features.</Paragraph> <Paragraph position="8"> It will be observed that contrasts by a nominal or numeric feature may (but will not necessarily) introduce a slight degree of disjunctiveness, which is to a somewhat greater extent the case in contrasts accomplished by derived features.</Paragraph> <Paragraph position="9"> Missing values do not present much problem, since they can be ignored without any need to estimate a value nor to discard the remaining informative features values of the instance. In the case of nominal features, missing values can be treated as just another legitimate feature value.</Paragraph> </Section> <Section position="3" start_page="1034" end_page="1034" type="sub_section"> <SectionTitle> 2.2 The simplicity criteria </SectionTitle> <Paragraph position="0"> MPD uses three intuitive criteria to guarantee the uncovering of the most parsimonious discrimination among classes: 2Besides these atomic feature values we may also support (hierarchically) structured values, but this will be of no concern here.</Paragraph> <Paragraph position="1"> ~Analogously to the Bacon program's invention of theoretical terms Langley et. al., 1987.</Paragraph> <Paragraph position="2"> 1. Minimize overall features. A set of classes may be demarcated using a number of overall feature sets of different cardinality; this criterion chooses those overall feature sets which have the smallest cardinality (i.e. are the shortest). 2. Minimize profiles. Given some overall feature set, one class may be demarcated--using only features from this set--by a number of profiles of different cardinality; this criterion chooses those profiles having the smallest cardinality. 3. Maximize coordination. This criterion maxi null mizes the coherence between class profiles in one discrimination model, 4 in the case when alternative profiles remain even after the application of the two previous simplicity criteria. 5 Due to space limitations, we cannot enter into the implementation details of these global optimization criteria, in fact the most expensive mechanism of MPD. Suffice it to say here that they are implemented in a uniform way (in all three cases by converting a logic formula - either CNF or something more complicated - into a DNF formula), and all can use both sound and unsound (but good) heuristics to deal successfully with the potentially explosive combinatorics inherent in the conversion to DNF.</Paragraph> </Section> <Section position="4" start_page="1034" end_page="1034" type="sub_section"> <SectionTitle> 2.3 An illustration </SectionTitle> <Paragraph position="0"> By way of (a simplified) illustration, let us consider the learning of the Bulgarian translational equivalents of the English verb feed on the basis of the case frames of the latter. Assume the following features/values, corresponding to the verbal slots: (1) NPl={hum,beast,phys-obj}, (2) VTR (binary feature denoting whether the verb is transitive or not), (3) NP2 (same values as NP1), (4) PP (binary feature expressing the obligatory presence of a prepositional phrase). An illustrative input to MPD is given in Table 1 (the sentences in the third column of the table are not a part of the input, and are only given for the sake of clarity, though, of course, would normally serve to deriving the instances by parsing).</Paragraph> <Paragraph position="1"> The output of the program is given in Table 2.</Paragraph> <Paragraph position="2"> MPD needs to find 10 pairwise contrasts between the 5 classes (i.e. N-choose-2, calculable by the formula N(N-1)/2 ), and it has successfully discriminated all 4 In a &quot;discrimination model&quot; each class is described with a unique profile.</Paragraph> <Paragraph position="3"> SBy way of an abstract example, denote features by F1...Fn, and let Class 1 have the profiles: (1) F1 F2, (2) F1 F3, and Class 2: (1) F4 F2, (2) F4 F5, (3) F4 F6. Combining freely all alternative profiles with one another, we should get 6 discrimination models. However, in Class 1 we have a choice between \[F2 F3\] (F1 must be used), and in Class 2 between \[F2 F5 F6\] (F4 must be used); this criterion, quite analogously to the previous two, will minimize this choice, selecting F2 in both cases, and hence yield the unique model Class 1: 1. NP1--hum VTR NP2=beast ~PP 2. NPl=hum VTR NP2=beast~PP 1. NPl=hum VTR NP2=hum~PP 2. NP1---beast VTR NP2=beast ~PP I. NPl-----beast ~VTR PP 2. NPl=beast ~VTR PP I. NPl--hum VTR NP2----phys-obj PP 2. NPl--hum VTR NP2=phys-obj PP 1. NPl=phys*obj VTR NP2=phys-obj PP 2. NPl=phys*obj VTR NP2=phys-obj PP 3. NPl=hum VTR NP2=phys-ob i PP classes. This is done by the overall feature set {NP1, PP, NPlxNP2}, whose first two features are primitive, and the third is a derived nominal feature. Not all classes are absolutely discriminated: Class 4 (zaxranvam) and Class 5 (podavam) are only partially contrasted by the feature NP1. Thus, Class 5 is 66.6% NPl=phys-obj since we need to retract 1/3 of its instances (particularly, sentence (3) from Table 1 whose NPl=hum) in order to get a clean contrast by that feature. Class 1 (otglezdam) and Class 2 (xranja) use in their profiles the derived nominal feature NPlxNP2; they actually contrast because all instances of Class 1 have the value 'hum' for NP1 and the value 'beast' for NP2, and hence the &quot;derived value&quot; \[hum beast\], whereas neither of the instances of Class 2 has an identical derived value (indeed, referring to Table 1, the first instance of Class 2 has NPlxNP2=\[hum hum\] and the second instance NPlxNP2=\[beast beast\]). The resulting profiles in Table 2 is the simplest in the sense that there are no more concise overall feature sets that discriminate the classes, and the profiles--using only features from the overall feature set--are the shortest.</Paragraph> </Section> </Section> class="xml-element"></Paper>