XML Viewer - p98-2170

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2170_metho.xml
Size: 19,841 bytes
Last Modified: 2025-10-06 14:15:00
<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2170">
  <Title>A Procedure for Multi-Class Discrimination and some Linguistic Applications</Title>
  <Section position="4" start_page="1034" end_page="1036" type="metho">
    <SectionTitle>
3 Componential analysis
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1034" end_page="1036" type="sub_section">
      <SectionTitle>
3.1 In lexlcology
</SectionTitle>
      <Paragraph position="0"> One of the tasks we addressed with MPD is semantic componential analysis, which has well-known linguistic implications, e.g., for (machine) translation (for a familiar early reference, cf. Nida, 1971). More specifically, we were concerned with the componential analysis of kinship terminologies, a common area of study within this trend. KINSHIP is a specialized computer program, having as input the kinterms (=classes) of a language, and their attendant kintypes (=instances). 6 It computes the feature values of the kintypes, and then feeds the result to the MPD component to make the discrimination between the kinterms of the language. Currently, KINSHIP uses about 30 features, of all types: binary (e.g., male={+/-}), nominal (e.g., lineal={lineal, co-lineal, ablineal}), and numeric (e.g., generation={1,2,..,n}).</Paragraph>
      <Paragraph position="1"> In the long history of this area of study, practitioners of the art have come up with explicit requirements as regards the adequacy of analysis: (1) Parsimony, including both overall features and kinterm descriptions (=profiles). (2) Conjunctiveness of kinterm descriptions. (3) Comprehensiveness in displaying all alternative componential models.</Paragraph>
      <Paragraph position="2"> As seen, these requirements fit nicely with most of the capabilities of MPD. This is not accidental, since, historically, we started our investigations by automating the important discovery task of componential analysis, and then, realizing the generic nature of the discrimination subtask, isolated this part of the program, which was later extended with the mechanisms for derived features and partial contrasts. null Some of the results of KINSHIP are worth summarizing. The program has so far been applied to more than 20 languages of different language families. In some cases, the datasets were partial (only consanguineal, or blood) kin systems, but in others they were complete systems comprising 40-50 classes with several hundreds of instances. The program has re-discovered some classical analyses (of the Amerindian language Seneca by Lounsbury), has successfully analyzed previously unanalyzed languages (e.g., Bulgarian), and has improved on previous analyses of English. For English, the most parsimonious model has been found, and the only one giving conjunctive class profiles for all kinterms, which sounds impressive considering the massive efforts concentrated on analyzing the English kinship  Most importantly, MPD has shown that the huge number of potential componential (-discrimination) models--a menace to the very foundations of the approach, which has made some linguists propose alternative analytic tools-- are in fact reduced to (nearly) unique analyses by our 3 simplicity criteria. Our 3rd criterion, ensuring the coordination between equally simple alternative profiles, and with no precedence in the linguistic literature, proved essential in the pruning of solutions (details of KINSHIP are reported in Pericliev and Vald&amp;-P@rez, 1997; Pericliev and Vald~s-P~rez, forthcoming).</Paragraph>
    </Section>
    <Section position="2" start_page="1036" end_page="1036" type="sub_section">
      <SectionTitle>
3.2 In phonology
</SectionTitle>
      <Paragraph position="0"> Componential analysis in phonology amounts to finding the distinctive features of a phonemic system, differentiating any phoneme from all the rest.</Paragraph>
      <Paragraph position="1"> The adequacy requirements are the same as in the above subsection, and indeed they have been borrowed in lexicology (and morphology for that matter) from phonological work which chronologically preceded the former. We applied MPD to the Russian phonemic system, the data coming from a paper by Cherry et. al., 1953, who also explicitly state as one of their goals the finding of minimal phoneme descriptions.</Paragraph>
      <Paragraph position="2"> The data consisted of 42 Russian phonemes, i.e.</Paragraph>
      <Paragraph position="3"> the transfer of feature values from instances (=allophones) to their respective classes (--phonemes) has been previously performed. The phonemes were de- null scribed in terms of the following 11 binary features: (1) vocalic, (2) consonantal, (3) compact, (4) diffuse, (5) grave, (6) nasal, (7) continuant, (8) voiced, (9) sharp, (10) strident, (11) stressed. MPD con- null firmed that the 11 primitive overall features are indeed needed, but it found 11 simpler phoneme profiles than those proposed in this classic article (cf. Table 3). Thus, the average phoneme profile turns out to comprise 6.14, rather than 6.5, components as suggested by Cherry et. al.</Paragraph>
      <Paragraph position="4"> The capability of MPD to treat not just binary, but also non-binary (nominal) features, it should be noted, makes it applicable to datasets of a newer trend in phonology which are not limited to using binary features, and instead exploit multivalued symbolic features as legitimate phonological building blocks.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="1036" end_page="1037" type="metho">
    <SectionTitle>
4 Language typology
</SectionTitle>
    <Paragraph position="0"> We have used MPD for discovery of linguistic typologies, where the classes to be contrasted are individual languages or groups of languages (language families).</Paragraph>
    <Paragraph position="1"> 7We also found errors in analyses performed by linguists, which is understandable for a computationally complex task like this.</Paragraph>
    <Paragraph position="2"> Classes I 2 3 4 5 6 7 8 9 I0 II</Paragraph>
    <Paragraph position="4"> In one application, MPD was run on the dataset from the seminal paper by Greenberg (1966) on word order universals. This corpus has previously been used to uncover linguistic universals, or similarities; we now show its feasibility for the second fundamental typological task of expressing the differences between languages. The data consist of a sample of 30 languages with a wide genetic and areal coverage.</Paragraph>
    <Paragraph position="5"> The 30 classes to be differentiated are described in terms of 15 features, 4 of which are nominal, and the remaining 11 binary. Running MPD on this dataset showed that from 435 (30-Choose-2) pairwise discriminations to be made, just 12 turned out to be impossible, viz. the pairs:</Paragraph>
    <Paragraph position="7"> The contrasts (uniquely) were made with a minimal set of 8 features: {SubjVerbObj-order, Adj &lt; N, Genitive &lt; N, Demonstrative &lt; N, Numeral &lt; N, Aux &lt; V, Adv &lt; Adj, affixation}.</Paragraph>
    <Paragraph position="8"> In the processed dataset, for a number of languages there were missing values, esp. for features  (12) through (14). The linguistic reasons for this were two-fold: (i) lack of reliable information; or (ii) non-applicability of the feature for a specific language (e.g., many languages lack particles for expressing yes-no questions, i.e. feature (12)). The above results reflect our default treatment of missing values as making no contribution to the contrast of language pairs. Following the other alternative path, and allowing 'missing' as a distinct value, will result in the successful discrimination of most language pairs. Greek and Serbian would remain indiscriminable, which is no surprise given their areal and genetic affinity.</Paragraph>
  </Section>
  <Section position="6" start_page="1037" end_page="1039" type="metho">
    <SectionTitle>
5 Speech production in aphasics
</SectionTitle>
    <Paragraph position="0"> This application concerns the discrimination of different forms of aphasia on the basis of their language behaviour.S We addressed the profiling of aphasic patients, using the CAP dataset from the CHILDES database (MacWhinney, 1995), containing (among others) 22 English subjects; 5 are control and the others suffer from anomia (3 patients), Broca's disorder (6), Wernicke's disorder (5), and nonfluents (3). The patients are grouped into classes according to their fit to a prototype used by neurologists and speech pathologists. The patients' records--verbal responses to pictorial stimuli--are transcribed in the CHILDES database and are coded with linguistic errors from an available set that pertains to phonology, morphology, syntax and semantics.</Paragraph>
    <Paragraph position="1"> As a first step in our study, we attempted to profile the classes using just the errors as they were coded in the transcripts, which consisted of a set of 26 binary features, based on the occurrence or non-occurrence of an error (feature) in the transcript of each patient. We ran MPD with primitive features and absolute contrasts and found that from a total of 10 pairwise contrasts to be made between 5 classes, 7 were impossible, and only 3 possible. We then used derived features and absolute contrasts, but still one pair (Broca's and Wernicke's patients) remained uncontrasted. We obtained 80 simplest models with 5 features (two primitive and three derived) discriminating the four remaining classes.</Paragraph>
    <Paragraph position="2"> We found this profiling unsatisfactory from a domain point of view for several reasons 9 which led us SWe are grateful to Prof. Brian MacWhinney from the Psychology Dpt. of CMU for helpful discussions on this application of MPD.</Paragraph>
    <Paragraph position="3"> degFirst, one pair remained uncontrasted. Second, only 3 pairwise contrasts were made with absolute primitive features, which are as a rule most intuitively acceptable as regards the comprehensibility of the demarcations (in this specific case they correspond to &amp;quot;standard&amp;quot; errors, priorly and independently identified from the task under consideration). And, third, some of the derived features necessary for the profiling lacked the necessary plausibil-</Paragraph>
    <Section position="1" start_page="1037" end_page="1039" type="sub_section">
      <SectionTitle>
Features and Partial Contrasts
</SectionTitle>
      <Paragraph position="0"> to re-examining the transcripts (amounting roughly to 80 pages of written text) and adding manually some new features that could eventually result in more intelligible profiling. These included:  (1) Prolixity. This feature is intended to simu null late an aspect of the Grice's maxim of manner, viz. &amp;quot;Avoid unnecessary prolixity&amp;quot;. We try to model it by computing the average number of words pronounced per individual pictorial stimulus, so each patient is assigned a number (at present, each word-like speech segment is taken into account). Wernicke's patients seem most prolix, in general. (2) Truthfulness. This feature attempts to simulate Grices' Maxim of Quality: &amp;quot;Be truthful. Do not say that for which you lack adequate evidence&amp;quot;. Wernicke's patients are most persistent in violating this maxim by fabricating things not seen in the pictorial stimuli. All other patients seem to conform to the maxim, except the nonfluents whose speech is difficult to characterize either way (so this feature is  considered irrelevant for contrasting).</Paragraph>
      <Paragraph position="1"> (3) Fluency. By this we mean general fluency, normal intonation contour, absence of many and long pauses, etc. The Broca's and non-fluent patients have negative value for this feature, in contrast to all others.</Paragraph>
      <Paragraph position="2"> (4) Average number of errors. This is the second numerical feature, besides prolixity. It counts  the average number of errors per individual stimulus (picture). Included are all coder's markings in the patient's text, some explicitly marked as errors, others being pauses, retracings, etc.</Paragraph>
      <Paragraph position="3"> Re-running MPD with absolute primitive features on the new data, now having more than 30 features, resulted in 9 successful demarcations out of 10. Two sets of primitive features were used to this end: {average errors, fluency, prolixity} and {average errors, fluency, truthfulness}. The Broca's patients and the nonfluent ones, which still resisted discrimination, could be successfully handled with nine alternative derived Boolean features, formed from different combinations of the coded errors (a handful of which are also plausible). We also ran MPD with primitive features and partial contrasts (cf. Table 4). Retracting one of the six Broca's subjects allows all ity for domain scientists.</Paragraph>
      <Paragraph position="4">  classes to be completely discriminated.</Paragraph>
      <Paragraph position="5"> These results may be considered satisfactory from the point of view of aphasiology. First of all, now all disorders are successfully discriminated, most cleanly, and this is done with the primitive features, which, furthermore, make good sense to domain specialists: control subjects are singled out by the least number of mistakes they make, Wernicke's patients are contrasted from anomic ones by their greater prolixity, anomics contrast Broca's and nonfluent patients by their fluent speech, etc.</Paragraph>
      <Paragraph position="6"> 6 MPD in the context of diverse application types A learning program can profitably be viewed along two dimensions: (1) according to whether the output of the program is addressed to a human or serves as input to another program; and (2) according to whether the program is used for prediction of future instances or not. This yields four alternatives: type (i) (+human/-prediction), type (ii) (+human/+prediction), type (iii) (-human/+prediction), and type (iv) (-human/-prediction).</Paragraph>
      <Paragraph position="7"> We may now summarize MPD's mechanisms in the context of the diverse application types. These observations will clear up some of the discussion in the previous sections, and may also serve as guidelines in further specific applications of the program. Componential analysis falls under type (i): a componential model is addressed to a linguist/anthropologist, and there is no prediction of unseen instances, since all instances (e.g., kintypes in kinship analysis) are as a rule available at the outset. 10 The aphasics discrimination task can be classed as type (ii): the discrimination model aims to make sense to a speech pathologist, but it should also have good predictive power in assigning future patients to the proper class of disorder.</Paragraph>
      <Paragraph position="8"> Learning translational equivalents from verbal case frames belongs to type (iii) since the output of the learner will normally be fed to other subroutines and this output model should make good predictions as to word selection in the target language, encountering future sentences in the source language. We did not discuss here a case of type (iv), so we just mention an example. Given a grammar G, the learner should find &amp;quot;look-aheads&amp;quot;, specifying which of the rules of G should be fired firstJ 1 In this task, ldegWe note that componential analysis in phonology can alternatively be viewed of type (iii) if its ultimate goal is speech recognition.</Paragraph>
      <Paragraph position="9"> llA trivial example is G, having rules: (i) sl--+np, vp, \['2\] ; (ii) s2-~vp, \['!'\] ; (iii) s3-~aux, np, v, \['?'\], where the classes are the LHS, the instances are the RHS, and the profiling should decide which of the 3 rules to use the output of the learner can be automatically incorporated as an additional rule in G (an hence be of no direct human use), and it should make no predictions since it applies to the specific G, and not to any other grammar.</Paragraph>
      <Paragraph position="10"> For tasks of types (i) and (ii), a typical scenario of using MPD would be: Using all 3 simplicity criteria, and finding all alternative models, follow the feature/contrast hierarchy: primitive features &amp; absolute contrasts &gt; derived &amp; absolute &gt; primitive &amp; partial &gt; derived &amp; partial which reflects the desiderata of conciseness, comprehensiveness, and intelligibility (as far as the latter is concerned, the primitive features (normally usersupplied) are preferable to the computer-invented, possibly disjunctive, derived features).</Paragraph>
      <Paragraph position="11"> However, in some specific tasks, another hierarchy seems preferable, which the user is free to follow. E.g., in kinship under type (i), the inability of MPD to completely discriminate the kinterms may very well be due to noise in the instances, a situation by no means infrequent, esp. in data for &amp;quot;exotic&amp;quot; languages. In a type (ii) task, an analogous situation may hold (e.g., a patient may be erroneously classed under some impairment), all this leading to trying first the primitive &amp; partial heuristic. There may be other reasons to change the order of heuristics in the hierarchy as well.</Paragraph>
      <Paragraph position="12"> We see no clear difference between types (i)-(ii) tasks, placing the emphasis in (ii) on the human addressee subtask rather than on prediction subtask, because it is not unreasonable to suppose that a concise and intelligible model has good chances of reasonably high predictive power. 12 We have less experience in applying MPD on tasks of types (iii) and (iv) and would therefore refrain from suggesting typical scenarios for these types. We offer instead some observations on the role of MPD's mechanisms in the context of such tasks, showing at some places their different meaning/implication in comparison with the previous two tasks: (1) Parsimony, conceived as a minimality of class profiles, is essential in that it generally contributes to reducing the cost of assigning an incoming instance to a class. (In contrast to tasks of types (i)-(ii), the Maximize-Coordination criterion has no clear meaning here, and the Minimize-Features may well be having as input say Come here/.</Paragraph>
      <Paragraph position="13"> 12By way of a (non-linguistic) illustration, we have turned the MPD profiles into classification rules and have carried out an initial experiment on the LED-24 dataset from the UC Irvine repository. MPD classified 1000 unseen instances at 73 per cent, using five features, which compares well with a seven features classifier reported in the literature, as well as with other citations in the repository entry.</Paragraph>
      <Paragraph position="14">  sacrificed in order to get shorter profiles). 13 (2) Conjunctiveness is of less importance here than in tasks of type (i)-(ii), but a better legibility of profiles is in any case preferable. The derived features mechanism can be essential in achieving intuitive contrasts, as in verbal case frame learning, where the interaction between features nicely fits the task of learning &amp;quot;slot dependencies&amp;quot; (Li and Abe, 1996).</Paragraph>
      <Paragraph position="15"> (3) All alternative profiles of equal simplicity are not always a necessity as in tasks of type (i)-(ii), but are most essential in many tasks where there are different costs of finding the feature values of unseen instances (e.g., computing a syntactic feature, generally, would be much less expensive than computing say a pragmatic one).</Paragraph>
      <Paragraph position="16"> The important point to emphasize here is that MPD generally leaves these mechanisms as program parameters to be set by the user, and thus, by changing its inductive bias, it may be tailored to the specific needs that arise within the 4 types of tasks.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML