File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-6008_intro.xml
Size: 9,376 bytes
Last Modified: 2025-10-06 14:03:00
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-6008"> <Title>Linguistically enriched corpora for establishing variation in support verb constructions</Title> <Section position="2" start_page="0" end_page="64" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> We aim at finding methods that facilitate the description of the linguistic behavior of multiword expressions. Empirical evidence and generalizations about the linguistic properties of multiword expressions are required to further a theory of fixed expressions (or multiword expressions) as well as to expand the coverage of NLP lexical resources and grammars.</Paragraph> <Paragraph position="1"> This paper describes an attempt to develop automated methods for induction of lexical information from a linguistically enriched corpus.</Paragraph> <Paragraph position="2"> In particular, the paper discusses to what extent can an automated corpus-based approach be useful to establish the variation potential of support verb constructions. The experimental work applies to Dutch expressions, however the issue is widely relevant in the development of lexical resources for other languages.</Paragraph> <Section position="1" start_page="0" end_page="63" type="sub_section"> <SectionTitle> 1.1 Partially lexicalized expressions </SectionTitle> <Paragraph position="0"> Corpus-based studies showed that certain fixed expressions and idioms allow limited variation and adjectival modification (Moon, 1998; Riehemann, 2001).1 Riehemann (2001) investigated various types of multiword expressions in English and observed that around 25% of idiom occurrences in a corpus allow some variation. By way of example, among the occurrences of the idiom keep tabs on '(fig.) watch', variation affects verb tense inflection, adjective modifiers (close, better, regular, daily), noun number morpheme (tab(s)) and the location of the on complement phrase that may be separate from the object NP.</Paragraph> <Paragraph position="1"> The above example is by no means an isolated case.</Paragraph> <Paragraph position="2"> Variation has an effect not only on the representation of the syntactic structure but also on the semantic interpretation of the multiword expression (Sag et al., 2001; Baldwin et al., to appear). The presence of variation in multiword expressions brings up two scenarios: (a) the loss of the peculiar meaning or (b) the modification of the original meaning. Returning to the example above, modifiers of tabs affect the interpretation of the event predicate as a whole. Thus, keep close tabs on s.o. means 'watch s.o. closely'. A different effect has been reported of some VERB NP idioms in which the adjectival modification affects only the complement NP (Nicolas, 1995).</Paragraph> <Paragraph position="3"> For a correct interpretation, such idiomatic expressions require internal semantic structure.</Paragraph> <Paragraph position="4"> These observations suggest that: (i) not all fixed expressions and idioms are frozen word combinations given that, parts of the expression participate in syntactic operations; (ii) some lexemes (in 'fixed' expressions) are subject to morphological processes; and (iii), some fixed expressions still preserve underlying semantic structure. A description that captures the previous facts needs to allow variable slots so that the mentioned variants of the expression are licensed by the grammar. In sum, variation is a property that should not be neglected while deciding the lexical representation of multiword expressions in computational resources.</Paragraph> </Section> <Section position="2" start_page="63" end_page="64" type="sub_section"> <SectionTitle> 1.2 Support verb constructions </SectionTitle> <Paragraph position="0"> Support verb constructions are made up out of a light verb (aka. support verb) and a complement (e.g. take into account). The predicational complement may be realized as a noun, an adjective or a prepositional phrase. The light verb and its complement form a complex predicate, in which the complement itself supplies most of the semantic load (Butt, 1995). The verb performs a 'support' function, i.e. it serves to 'further structure or modulate the event described by the main predicator' (Butt, 1995). Most researchers agree that the light verb adds aspect, tense and 'aktionsart' information to the predicate. Since the support verb's meaning differs from the meaning of the (main) verb lexeme, the meaning of the support verb construction is not fully compositional. Due to the similarities with other idiosyncratic expressions, support verb constructions (LVCs) belong to the group of lexicalized multiword expressions (Sag et al., 2001).</Paragraph> <Paragraph position="1"> We limit this study to support verb constructions for two practical reasons. First, there seems to be a group of core light verbs that exist crosslinguistically. Thus, we can concentrate on a small set of verbal lexemes. Second, these light verbs are based on main verbs still in active use in the language (Butt, 1995). Concerning Dutch, nine verbs that can function as main but also as light verbs are brengen 'bring', doen 'do', gaan 'go', geven 'give', hebben 'have', komen 'come', krijgen 'get', maken 'make', nemen 'take' and stellen 'state' (Hollebrandse, 1993). Establishing the lexical properties of light verb predicates is necessary so that parsers do not misanalyze main verb and light verb uses.</Paragraph> <Paragraph position="2"> Before we describe a corpus-based method to extract evidence of variation from a syntactically annotated corpus, we enumerate some research assumptions and highlight the types of variation and modification object of this study. Section 3 presents the automated method and the evaluation of its merits. Section 4 describes a proposal of the required lexical annotation drawn from a working implementation. Our conclusions and further improvements are summarised in section 6.</Paragraph> <Paragraph position="3"> 2 Base form, variation and modification In addition to a subject, some prepositional support verb constructions select an additional complement. This may be realized by an accusative, dative or reflexive NP. Prior to applying the corpus-based method described in section 3, we partly ignore the lexical content within the PP complement; this is also why we want to establish the variation potential within LVCs. For the above two reasons, we assume that the minimum required lexemes (i.e. common to all prepositional LVCs) include the argument PP and the support verb and represent each expression as a triple of the form [PREPOSITION NOUN VERB] (P N V).</Paragraph> <Paragraph position="4"> (Thus, determiners and modifiers are left out).</Paragraph> <Paragraph position="5"> Some further assumptions must be introduced, namely, what we understand as a base form and as a variant of a support verb construction. The base form includes the mentioned triple and may include other lexicalized arguments. In expressions that allow no morphosyntactic variation or modification within the required arguments, tense inflection is usually possible. The base form shows the infinitive verb form. The base form of the expression voet bij stuk houden 'stick to one's guns (fig)' includes the noun voet, the PP bij stuk and the verb houden; tense inflection is possible (1-b).</Paragraph> <Paragraph position="6"> (1) a. VOET BIJ STUK HOUDEN argument differs from the NOUN lexeme is considered a variant. The expression uit zijn dak gaan 'go crazy' has as base form (2-a) with the noun dak allowing various possessive determiners We study variation observed within the expression. We focus on two levels: lexeme level productive inflectional and derivational morphology.</Paragraph> <Paragraph position="7"> phrase level variability in specifiers and modifiers. null The evidence we seek to extract is the following: (a) use of diminutive in nominal lexemes; (b) singular and plural alternation in nouns. Evidence of derivational morphology, for example, instances of compounding (another noun or an acronym prefixed to the head noun) or a genitive noun modifier; (c) alternation in specifiers. Among the specifiers: zero determiner, definite, indefinite, reciprocals, possessives, demonstratives and quantifiers; (d) NPs that are realized by reflexives. Reflexives may instantiate either open argument slots or an NP within complement PPs; and (e), among modification, we explore pre-nominal adjectives, past participles, gerunds and other intervening material.</Paragraph> <Paragraph position="8"> In addition, some expressions allow relative clauses and PP post-nominal modifiers. Relative clauses are observed less often than PP post-nominal modifiers. So far, we ignore these two types of modification because we extract the evidence from an automatically annotated corpus and with automated means. It is well-known that disambiguating a syntactic attachment site, e.g. a PP-attachment site, is one of the hardest problems for present-day parsing technology. Needless to say, the parser (Alpino) also encounters difficulties with this problem. In this work, we did not investigate syntactic flexibility at the sentence level, that is, processes such as passive, topicalization, control, clefting, coordination, etc.</Paragraph> </Section> </Section> class="xml-element"></Paper>