File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-6010_metho.xml
Size: 23,580 bytes
Last Modified: 2025-10-06 14:09:40
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-6010"> <Title>Some remarks on the Annotation of Quantifying Noun Groups in Treebanks</Title> <Section position="3" start_page="81" end_page="83" type="metho"> <SectionTitle> 2 Indications of Quantity - Variants of </SectionTitle> <Paragraph position="0"> the Quantitative Specification In order to apprehend a non-discrete, amorphous substance, such as Lehm (loam), as a discrete object and therefore to make it countable, and in order to integrate discrete entities, such as Blumen (flowers), in one more complex unit, there are mainly two possibilities in German: In the context of parsing and structure annotation, the latter possibility does not pose any problems, whereas the first one calls for further investigation with a view to a workable strategy concerning the analysis and annotation of such constructions.</Paragraph> <Section position="1" start_page="81" end_page="81" type="sub_section"> <SectionTitle> 2.1 The Structure of the German Quantifying Noun Group (quanNG) </SectionTitle> <Paragraph position="0"> The German quantifying noun group (quanNG) is composed of a numeral (Num), a nominal constituent used for quantification (quanN) and a nominal constituent denoting the quantified element or substance (ElemN) (cf. figure 1). Thus, quanNG consists of a quantifying constituent (e.g. , ein Klumpen in example 1) and a quantified constituent (e.g. , Lehm in example 1).</Paragraph> <Paragraph position="1"> With respect to the nouns that can function as the head of the quantifying constituent quanN, we distinguish four classes of quantifying noun groups: 1. numeral noun constructions; 2. quantum noun constructions; 3. count constructions; and 4. measure constructions.</Paragraph> <Paragraph position="2"> These four main classes are subdivided into several syntactically and semantically motivated sub-classes (see table 1). A more detailed description of the respective classes is presented in the following sections.</Paragraph> </Section> <Section position="2" start_page="81" end_page="83" type="sub_section"> <SectionTitle> 2.2 Four Classes of Quantifying Noun Groups </SectionTitle> <Paragraph position="0"> The quantifying aspect to the meaning of the complex noun group quanNG is contributed by a quantifying noun quanN. Starting from the specific nature of these quanN, we differentiate between four classes of quanNG. Depending on whether the head noun of the quanN can only yield a contribution to the quantitative aspect of the complex noun phrase, as it is the case with Kilogramm (kilogram) or Million (million), or if the contribution to the meaning of the complex noun phrase is of quantitative nature only in certain contexts, as it is the case with Tasse (cup), or Schritt (step), there arise cases of ambiguity we are faced with in the context of structure analyzing and annotation.</Paragraph> <Paragraph position="1"> In German, in addition to the numerals, there are some nouns such as Dutzend (dozen) or Million (million), that bring out specific indications of number and that do not go beyond this quantitative aspect (in contrast to Paar (pair)). quanNG with such a noun as head of quanN are called numeral noun constructions.</Paragraph> <Paragraph position="2"> Quantum noun constructions can be characterized by two observations: 1. they cannot be freely combined with numerals; and 2. they express indefinite quantities.</Paragraph> <Paragraph position="3"> Numeral noun constructions and quantum noun constructions are mentioned here only for the sake of completeness1.</Paragraph> <Paragraph position="4"> In the context of this article, we concentrate on measure constructions and count constructions. Measure Constructions For the description of measure constructions the concept of measurement, as known from the theory of measurement (cf. (Suppes, 1963)), is used. By virtue of measure constructions, a real number n is assigned to 1Interested readers are referred to Wiese (1997) for a detailed description of these construction types.</Paragraph> </Section> </Section> <Section position="4" start_page="83" end_page="85" type="metho"> <SectionTitle> CATEGORY EXAMPLES </SectionTitle> <Paragraph position="0"> (1) numeral noun construction Dutzend (dozen), Million (million) (2) quantum noun construction Menge (number), Unmenge (vast number), Unsumme (amount), Vielzahl (multitude) (3) count construction: (a) count noun construction: (i) numeral classifier construction Stuck (piece) (ii) shape noun construction Tropfen (drop), Laib (loaf), Scheibe (slice) (iii) container noun construction Glas (glass), Tasse (cup), Kiste (crate) (iv) singulative construction Halm (blade), Korn (grain) (b) sort noun construction Sorte (kind), Art (type) (c) collective noun construction: (i) configuration construction Stapel (stack), Haufen (heap) (ii) group collective construction Herde (herd), Gruppe (group), Paar (pair) (4) measure construction: (a) measuring unit construction (muc): (i) abstract muc Meter (meter), Grad (degree), Euro (euro) (ii) concrete muc: - container noun construction Glas (glass), Tasse (cup), Kiste (crate) - action noun construction Schluck (gulp/mouthful), Schritt (step) (iii) relative muc Prozent (percent) Table 1: Categorization of quanNG with respect to quanN a measure object u that determines the value of a property P (such as weight or temperature). In the context of measurement, u is correlated to a set m of measure objects (such as weights), whose quantity directly or indirectly indicates the value of the measured property P. The correlation to m is made up by virtue of a measure function M, that maps P(u) onto m.</Paragraph> <Paragraph position="1"> The possible properties P are called dimensions and the measure function is called measuring unit.</Paragraph> <Paragraph position="2"> m is an indication of quantity such as 30 kilograms. There exist two specification relations between m and u: (5) vier 'four kilograms of iron' In example 5 the dimension is explicitly given, in example 6 the dimension is not explicitly given, but can be inferred from the measure function M (Kilogramm) and the measure object u (Eisen).</Paragraph> <Paragraph position="3"> m is restricted concerning the compatibility with dimension denoting nouns, verbs, and adjectives.</Paragraph> <Paragraph position="4"> Thus, indications in metres can only be combined with spatial dimensions such as height or length, but they cannot be combined with temporal relations. These restrictions can be modelled by assuming that there are scales and degrees functioning as abstract entities that are correlated by dimensions and objects (cf. (Cresswell, 1977)).</Paragraph> <Paragraph position="5"> Scales and degrees are to be considered as means for the formal analysis of quantity-related phenomena. A reasonable possibility to comprehend degrees as abstractions over objects seems to be the assumption that a degree is a class of objects that cannot be differentiated wrt. the considered dimension. A scale is a totally ordered set of degrees. Indications of measurement are predicates over degrees or degree distances. Degrees of different scales cannot be compared with each other. However, entities can be measured with the help of different measuring units wrt. to one and the same dimension. That means, there are different measure functions (such as euros and dollars) that are linked to each other depending on the respec- null tive dimension (such as price or hire).</Paragraph> <Paragraph position="6"> By means of corpus investigations, we designed eight scales. For each of these scales, we collected the belonging measuring units (i.e. the measure functions) together with the dimension denoting adjectives and verbs referring to the same scale. Whereas for the dimension weight there exist a dimension denoting noun as well as a dimension denoting adjective and a dimension denoting verb, there are other dimensions for which there are lexical gaps. Table 2 lists our scales together with an exemplarily chosen measure unit and the belonging dimension(s) and lexemes2. The scales presented in table 2 are conceived with a view to syntactic analysis. Since mass and volume look alike wrt. their surface re2Several derived dimensions such as &quot;frequency&quot; or &quot;density&quot; are not considered.</Paragraph> <Paragraph position="7"> alizations, i.e. the sets of dimension denoting adjectives and verbs referring to the scales of mass and volume overlap to a considerable degree, we do not differentiate between these two scales in the context of analysis. In our parsing system the incompatibility of different scales is reflected in lexicalized grammar rules: adjectives referring to the weight scale can only take as measure argument measurement indications containing a measuring unit referring to the same scale. This is especially important in order to be able to distinguish an indication-of-quantity-reading (cf. example 7) from a reading as a measure argument of an adjective (cf. example 8).</Paragraph> <Paragraph position="8"> 'three hectare large fields' The measure constructions can be divided wrt.</Paragraph> <Paragraph position="9"> the nature of the used measuring units into abstract, concrete and relative measuring unit constructions. null</Paragraph> </Section> <Section position="5" start_page="85" end_page="87" type="metho"> <SectionTitle> Abstract Measuring Unit Constructions </SectionTitle> <Paragraph position="0"> Abstract measuring unit constructions contain an abstract measuring unit as quanN. These abstract measuring units are defined in the frame of physical theories and they are always restricted to a certain scale. By virtue of different scales, measuring units can be categorized. The units of one and the same scale can be converted into each other.</Paragraph> <Paragraph position="1"> Apart from the abstract measuring units there are a number of concrete measuring units. The concrete measuring unit constructions can be subdivided into 1. container noun constructions; and 2. action noun constructions.</Paragraph> <Paragraph position="2"> Container nouns can be used as measuring units that are definable by virtue of the capacity that can be assigned to the concrete object. In the case of action noun constructions, the measurement acting as the basis for the definition of the measuring unit is the capacity restriction of the respective action. null From a linguistical point of view, the distinction of abstract measuring unit construcitons and concrete measuring unit constructions is important insofar as: 1. dimension denoting nouns are preferably combined with abstract or at least heavily standardized measuring units; and 2. if mass nouns are directly combined with nu- null merals, such as zwei Kaffee (two coffees), this construction cannot be understood as an abbreviation of an abstract measuring unit construction, such as zwei Liter Kaffee (two liters of coffee), but, instead, it must be understood as an abbreviation of a concrete measuring unit construction, such as zwei Tassen Kaffee (two cups of coffee).</Paragraph> <Section position="1" start_page="85" end_page="85" type="sub_section"> <SectionTitle> Relative Measuring Unit Constructions </SectionTitle> <Paragraph position="0"> Relative measuring units such as Prozent (percent) serve to specify relative sizes. Relative measuring units are not restricted to a certain scale.</Paragraph> </Section> <Section position="2" start_page="85" end_page="87" type="sub_section"> <SectionTitle> Count Constructions Count constructions </SectionTitle> <Paragraph position="0"> contrast to mass constructions. Count constructions do not serve for the measurement of certain substances, but, for the numerical quantification of discrete entities. That means, the number assignment does not identify values of a certain property P, but, it refers to the number of discrete entities. Restrictions concerning the compatibility are dependent on the properties of the denotates of the nouns they refer to.</Paragraph> <Paragraph position="1"> Depending on quanN, the count constructions can be subdivided into three construction types: 1. count noun constructions; 2. sort noun constructions; and 3. collective noun constructions.</Paragraph> <Paragraph position="2"> Count Noun Constructions The count noun constructions are again split up wrt. the nature of the respective quanN into: 1. numeral classifier constructions; 2. shape noun constructions; 3. container noun constructions; and 4. singulative constructions.</Paragraph> <Paragraph position="3"> Numeral classifiersonly play a marginal role in German since also the direct combination of a count noun with a numeral functions as an indication of counting. In other languages, numeral classifier constructions occur far more often (cf. (Bond, 1996)).</Paragraph> <Paragraph position="4"> In German, shape nouns such as Scheibe (slice) or Laib (loaf) specify spatial (shape) properties of the object. Even if it is possible to cut a loaf of bread into 15 slices of bread, the adequacy of the usage of a loaf of bread or 15 slices of bread depends on the actual state of the object referred to. Thus, shape nouns only reflect object-inherent properties. This distinguishes shape noun constructions from abstract measuring unit constructions: It is irrelevant whether we describe a roast as 1 kilo of meat or as two pounds of meat. In the latter case, it is not necessary that we have two separate portions of meat. Count constructions with shape nouns as count units have a complex semantic structure: The object of the numerical quantification is an entity of a certain substance with a certain shape. Thus, the construction denotes objects that are identified by virtue of two conceptual components, namely &quot;shape&quot; and &quot;substance&quot;. The referees of the shape noun and the substance noun form one complex concept, whose instances are numerically quantified. These combinations can typically also be expressed with the help of a meaning-conserving compound.</Paragraph> <Paragraph position="5"> Container nouns have a special status within the quanN, since they belong to the absolute nouns and only their usage in quantifying constructions transforms them into relational nouns that need the completion by Num and ElemN. The container nouns can be used outside the quanNG to refer to the corresponding concrete entities. We distinguish between three readings of container noun constructions: 'In the carafe are three glasses of water.' In example 9 Glas is an absolute noun denoting the physical object. In examples 11 and 10 it refers to indications of quantity. But, whereas in example 10 it deals with the concrete quantity of water which is in the given containers, i.e. the glasses, in example 11, it deals with a conventionalized quantity of water corresponding to the conventional standard size of the container of the given kind, i.e. the glass. The difference between the latter examples reflects the difference between container noun constructions as count constructions (cf. example 10) and container noun constructions as measure constructions (cf. examples 11).</Paragraph> <Paragraph position="6"> Often, count constructions with container nouns as count unit and measure constructions with container nouns as measuring unit are not distinguishable from each other, since the quantification of a set of equally large (filled) containers always also identifies the volume of their content. Thus, it is not always possible to decide, if, with a given construction, containers are numerically quantified or if the volume of their contents is measured. Concrete measuring units, in contrast to container nouns as count units, often occur in a number-unmarked form. But, first, this is not always the case, and, second, a missing number marking is only a hint for a measure construction; the reverse inference cannot be drawn: a plural noun can function both as count unit and as measuring unit (cf. example 12, where a plural container noun undoubtedly functions as measuring unit).</Paragraph> <Paragraph position="7"> 'Mix one ball of vanilla ice cream, one glass of pineapple juice and two glasses of milk in the mixer.' Sort Noun Constructions Sort noun constructions allow for indications of quantity based on sortal distinctions.</Paragraph> <Paragraph position="8"> Collective Noun Constructions Among the collective noun constructions we distinguish between null 1. configuration constructions; and 2. collective noun constructions.</Paragraph> <Paragraph position="9"> The configuration constructions are similar to shape noun constructions insofar as the configuration noun as well as the shape noun carries annotation of a certain shape. But, in contrast to the shape nouns, collective noun constructions do not denote an individuating operation, but a collectivizing operation.</Paragraph> <Paragraph position="10"> In addition to the collectivizing effect, the quanN in group collective constructions carries certain social and functional aspects. null</Paragraph> </Section> </Section> <Section position="6" start_page="87" end_page="88" type="metho"> <SectionTitle> 3 The Quantifying Noun Group in TIGER and NEGRA </SectionTitle> <Paragraph position="0"> In accordance with the TIGER annotation scheme (cf. (Albert et al., 2003)), in TIGER (and also in NEGRA), quantifying noun groups are annotated as sequences of nouns, i.e. entirely flat. Moreover, there is no distinction between quantifying noun groups and other kinds of noun groups composed of several nominal constituents, i.e., ein Liter Wasser (one liter of water) is annotated the same way as eine Art Seife (a kind of soap) and die Bahn AG (the Bahn AG).</Paragraph> <Section position="1" start_page="87" end_page="88" type="sub_section"> <SectionTitle> 3.1 Proposals for a refinement of the </SectionTitle> <Paragraph position="0"> annotation scheme Starting from the observations and problems described in the preceding sections, we propose certain partly syntactically, partly semantically motivated refinements of the TIGER annotation scheme.</Paragraph> <Paragraph position="1"> First of all, the quantifying noun group should not be annotated as a sequence of nouns, but ElemN should be a separately built up noun phrase, attached to quanN, which in turn constitutes a complex noun phrase together with Num. With this annotation, we reflect the fact, that quanNG can be considered as a loose appositive syntagma (cf. (Krifka, 1989)). quanNG as a complex noun phrase should get a label signalling that it is an indication of quantity.</Paragraph> <Paragraph position="2"> Second, the determination of the head of quanNG has to be rethought: from a syntactic point of view, quanN functions as head, from a semantic point of view ElemN functions as head. We plead for the annotation of a complex head consisting of both, the head of quanN and the head of ElemN. The decision for always assigning a two-place head to quanNG is quite important if we deal with elliptic constructions. In sentences, such as example 13, we do not want to infer that the boy eats a plate, and in sentences, such as example 14, we want to infer that the coffee is in a certain container.</Paragraph> <Paragraph position="3"> coffees.</Paragraph> <Paragraph position="4"> 'They ordered two (cups of) coffee.' Moreover, in the context of quanNG it should be annotated if the involved ElemN functions as count noun or as mass noun. In measure constructions, singular count nouns always get a mass noun reading. This information is quite useful wrt. theoretical linguistic investigations.</Paragraph> <Paragraph position="5"> A similar question is the differentiation between container nouns as count units and as measuring units. Again, this distinction would allow for fine-grained linguistic research using a treebank as a valuable resource.</Paragraph> <Paragraph position="6"> Because of the minor lexical content of count and measuring units, indications of quantity can only be combined with few adjectives. Therefore, there are cases, in which the adjective does not refer to quanN, but directly to ElemN (cf. example 15).</Paragraph> <Paragraph position="7"> 'some glasses of foamy beer' In these cases, we should have a link signalling that schaumend does not refer to the glass, but to the beer. That means, that despite surface order and agreement phenomena (schaumende does not congrue with Bier), the information that the adjective modifies ElemN and not quanN should be contained in the treebank.</Paragraph> <Paragraph position="8"> Concerning the refinement proposals so far, in most of the cases the annotation requires manual work. Another ambiguity that has to be resolved in a treebank concerns all nouns that do not only have a quanN-reading, but that can also be used outside the quanN and then refer to the respective concrete object. If we find a sequence of two nouns and the first one could be a quanN, we have to decide whether it is a quanNG or not. Sometimes, this problem could be solved by means of subcategorization information. But, considering two sentences, such as sentences 16 and 17, we cannot distinguish the two readings on purely syntactic grounds.</Paragraph> <Paragraph position="9"> chocolade.</Paragraph> <Paragraph position="10"> 'He donated three bars of chocolade.' For sentence 16, a concrete-object-reading should be annotated (as depicted in figure 2), whereas for sentence 17 an indication-of-quantity-reading should be annotated (as depicted in figure 3). In the parsing system of SILVA, we use lexicalized rules in order to exclude an indication-of-quantity-reading for sentence 16. Lists containing information about quanN and their potential ElemN were gained by corpus investigation. Even if they are, of course, not exhaustive, they could serve as a starting point for a (semi)-automatic annotation of indications of quantity containing a noun as head of a quanN that can refer to a concrete obect.</Paragraph> <Paragraph position="11"> 4 Using information about indications of quantity for grammar induction In a first experiment, we enriched NEGRA with information about 1. our scales together with their respective adjectives (cf. table 2); and 2. nouns that could yield a concrete-object-reading as well as an indication-of-quantity-reading together with typical ElemN follow- null Starting from the enriched TIGER we used BitPar (cf. (Schmid, 2004) and (Schiehlen, 2004)), an efficient parser for treebank grammars in order to induce a grammar and parse the treebank.</Paragraph> <Paragraph position="12"> Without the additional information BitPar reached an F-Value of 76.17%. After adding the information the F-Value increased to 76.27%.</Paragraph> <Paragraph position="13"> Obviously, this is not a tremendous improvement of performance, but looking at the absolute numbers, we get a more differentiated picture: there are 11 constructions containing an indication of measurement functioning as measure complement of an adjective. Before adding the information about the scales and the respective adjectives, only 3 constructions were rightly annotated; after having added the information, 6 constructions were rightly annotated.</Paragraph> <Paragraph position="14"> That indicates that the annotation can help increase the performance, and it does not lower the performance, which is not self-evident.</Paragraph> </Section> </Section> class="xml-element"></Paper>