File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/91/j91-3002_abstr.xml
Size: 8,661 bytes
Last Modified: 2025-10-06 13:47:16
<?xml version="1.0" standalone="yes"?> <Paper uid="J91-3002"> <Title>Chinese Number-Names, Tree Adjoining Languages, and Mild Context-Sensitivity</Title> <Section position="2" start_page="0" end_page="278" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> In recent years, we have seen in the linguistic literature a number of arguments; e.g., Culy (1985), Huybregts (1984), Shieber (1985), which purport to demonstrate that the class of N(atural) L(anguage)s is not generated by formalisms of C(ontext)-F(ree) power. In the context of NLs, little has been said regarding the generative inadequacy of formalisms such as single- or multiple-component T(ree) A(djoining) G(rammar)s (Joshi 1985, 1987), H(ead) G(rammar)s (Pollard 1984; Roach 1987), L(inear) I(ndexed) G(rammar)s (Gazdar 1988), or Combinatory Categorial Grammars (Steedman 1985, 1987). These formalisms are among the so-called Mildly C(ontext)-S(ensitive) G(rammar)s since they are non-CF; i.e., strictly CS, but only to a limited extent. More will be said about these grammars and their object languages below.</Paragraph> <Paragraph position="1"> Notable exceptions to the trend of demonstrating only non-context-freeness are Kac (1987) for English and Manaster-Ramer (1987a) for Dutch and German. In addition to demonstrating non-context-freeness, both these studies argue that the constructions used for their respective argumentations can serve as a basis for demonstrating that the NLs in question are generated neither by TAGs nor by HGs. However, these constructions rely crucially on coordination, and our current understanding of the properties of coordination is far from satisfactory. In this paper we show that the number-name system of Chinese, specifically of the Mandarin dialect, is neither a single- nor a multiple-component TAL, 1 raising doubts about whether it could be Department of Cognitive and Linguistic Sciences, Providence, RI 02912 1 Henceforth we use the acronym TAL to refer uniquely to single component Tree Adjoining Language, and MCTAL to refer to Multiple Component TAL. (Likewise TAG and MCTAG, mutatis mutandis).</Paragraph> <Paragraph position="2"> (~) 1991 Association for Computational Linguistics Computational Linguistics Volume 17, Number 3 considered a Mildly CSL at all. Our argument relies in no way on overt coordination operators.</Paragraph> <Paragraph position="3"> In Section 2 we present an argument originally proposed in Zwicky (1963) wherein he showed that the English naming system for cardinal numbers is a non-CFL. We discuss possible objections to his claims. Some Chinese data are presented in Section 3. In Section 4 we deal with a few Mildly CS formalisms and show that the Chinese number-name system (henceforth N(umeric) C(hinese)) is a non-TAL. In Section 5, we discuss additional grammar formalisms and show that NC is not a M(ultiple) C(omponent) TAL. We also investigate if NC can be characterized as a Mildly CSL.</Paragraph> <Paragraph position="4"> We discuss the linguistic relevance of our formal results in Section 6. Finally, Section 7 presents the conclusions of this study.</Paragraph> <Paragraph position="5"> 2. Zwicky (1963) and Objections Thereunto Zwicky (1963) discusses some constructions of names for cardinal numbers that are not generated by a CFG. The one he labels (1) resembles the structure of very large number-names in English (and other NLs): NTn(, NTn-1)... (, NT)(, N) (1) In this construction, N indicates a number between I and 999, T is an abbreviation for thousand, commas indicate an intonational pause, and everything within parentheses is optional. This construction could be characterized as follows: (i) Given a system in English, for example, where thousand is used as the largest single word for a number, million would be represented as thousand thousand, (Amer.) billion as thousand thousand thousand, (Amer.) trillion as thousand thousand thousand thousand, etc., ad infinitum.</Paragraph> <Paragraph position="6"> (ii) In a system like (i), larger clusters of thousand must precede smaller clusters of thousand in the same manner that decillion must precede trillion, which must precede million, which must precede thousand in the standard English number-name system using single-words for numbers of higher values.</Paragraph> <Paragraph position="7"> Zwicky relates construction (1) to the formal language P:</Paragraph> <Paragraph position="9"> He proceeds, inter alia, to prove that P is non-context-free. A conclusion from his study is that the sublanguage of English encompassing the names for cardinal numbers is strictly context-sensitive.</Paragraph> <Paragraph position="10"> Although Zwicky's mathematical argumentation is sound, room is left for some investigators to cast doubts on whether his claims bear in any significant way on NL. The empirical basis for Zwicky's argument rests largely on whether characteristics (i) and (ii) are indeed linguistically real. There has been much controversy over the status of these characteristics. For example, Merrifield (1968, p. 91) states the following: In working with a language isolate such as a system for naming numbers, several things should be kept in mind.</Paragraph> <Paragraph position="11"> In the first place, such a system differs from the larger grammar of which it is but a segment in not being indefinitely recursive. A grammar of a natural language accounts for an infinite number of utterances; a grammar of number names apparently does not. The latter is limited by the number of linguistic primitives of the sort 'billions,' 'trillions,' 'quadrillions,' etc., which it includes. And though a mathematician is presumably able to write down in mathematical notation an infinitely large set of numbers, when he attempts to give names to the members of the set in a natural language, he is limited by the number of primitives at his disposal.</Paragraph> <Paragraph position="12"> Greenberg (1978, p. 253) expresses Merrifield's assertion as the generalization that, &quot;every language has a numeral system of finite scope.&quot; Greenberg then proceeds to claim that the largest expressible natural number in American English is 1036-1 &quot;assuming that, as in most dictionaries of AMERICAN ENGLISH, the lexical item with the highest numerical value is 'decillion'. &quot;2 Thus, Merrifield and Greenberg take the view that there is an upper limit on linguistically expressible number-names. Hence, by this view, characteristic (i) appears not to be linguistically warranted. Hurford (1975, p. 4) suggests otherwise: Now it can be argued that the class of number expressions in any given language is infinite. Intuitions of language users differ on the matter of whether the set of number expressions in their language is infinite. The crux of the matter is the question whether the names for very high numbers are in fact wellformed.</Paragraph> <Paragraph position="13"> In English, for example, the expression two billion billion, five hundred and five may be felt by some speakers to be quite wellformed, though of course unlikely to be observed, whereas other speakers may object that it is not wellformed.</Paragraph> <Paragraph position="14"> Accordingly, characteristic (i) is linguistically warranted for at least some speakers. Hurford (ibid.) takes in fact this position: It will become obvious as we proceed that the particular systematic characteristics which are evident in natural language number-name systems tend to project the existence of infinite sets of number-names and a higher limit to the value of wellformed number-names can only be stated in a fairly ad hoc arbitrary manner.</Paragraph> <Paragraph position="15"> Epstein (1978, p. 123) contests this claim by arguing: Contrary to what Hurford claims, there are a finite number of these \[numerical expressions in English\]. Ten to the trillionth power, for example, has no corresponding counting expression.</Paragraph> <Paragraph position="16"> Hurford (1979, p. 42) responds: This is a misconception. It would be similarly wrong to assert that there is no single English sentence giving the full names, addresses, heights, weights, and IQ's of all UK citizens at midnight on March 1st, 1978. Such a sentence would be impossibly long to utter, but that is not a restriction which need be stated as part of English grammar, or indeed of general linguistic theory. If the highest-valued number word in your vocabulary is trillion, and you want to express higher numbers, you just string together enough trillions to get you there. Nobody, as a</Paragraph> </Section> class="xml-element"></Paper>