File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2110_intro.xml
Size: 5,482 bytes
Last Modified: 2025-10-06 14:04:04
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2110"> <Title>Automatic Identification of English Verb Particle Constructions using Linguistic Features</Title> <Section position="2" start_page="0" end_page="65" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Multiword expressions (hereafter MWEs) are lexical items that can be decomposed into multiple simplex words and display lexical, syntactic and/or semantic idiosyncracies (Sag et al., 2002; Calzolari et al., 2002). In the case of English, MWEs are conventionally categorised syntacticosemantically into classes such as compound nominals (e.g. New York, apple juice, GM car), verb particle constructions (e.g. hand in, battle on), non-decomposable idioms (e.g. a piece of cake, kick the bucket) and light-verb constructions (e.g.</Paragraph> <Paragraph position="1"> make a mistake). MWE research has focussed largely on their implications in language understanding, fluency and robustness (Pearce, 2001; Sag et al., 2002; Copestake and Lascarides, 1997; Bannard et al., 2003; McCarthy et al., 2003; Widdows and Dorow, 2005). In this paper, our goal is to identify individual token instances of English verb particle constructions (VPCs hereafter) in running text.</Paragraph> <Paragraph position="2"> For the purposes of this paper, we follow Baldwin (2005) in adopting the simplifying assumption that VPCs: (a) consist of a head verb and a unique prepositional particle (e.g. hand in, walk off); and (b) are either transitive (e.g. hand in, put on) or intransitive (e.g. battle on). A defining characteristic of transitive VPCs is that they can generally occur with either joined (e.g. He put on the sweater) or split (e.g. He put the sweater on) word order. In the case that the object is pronominal, however, the VPC must occur in split word order (c.f. *He handed in it) (Huddleston and Pullum, 2002; Villavicencio, 2003).</Paragraph> <Paragraph position="3"> The semantics of the VPC can either derive transparently from the semantics of the head verb and particle (e.g. walk off ) or be significantly removed from the semantics of the head verb and/or particle (e.g. look up); analogously, the selectional preferences of VPCs can mirror those of their head verbs or alternatively diverge markedly. The syntax of the VPC can also coincide with that of the head verb (e.g. walk off ) or alternatively diverge (e.g. lift off ).</Paragraph> <Paragraph position="4"> In the following, we review relevant past research on VPCs, focusing on the extraction/identification of VPCs and the prediction of the compositionality/productivity of VPCs.</Paragraph> <Paragraph position="5"> There is a modest body of research on the identification and extraction of VPCs. Note that in the case of VPC identification we seek to detect individual VPC token instances in corpus data, whereas in the case of VPC extraction we seek to arrive at an inventory of VPC types/lexical items based on analysis of token instances in corpus data. Li et al. (2003) identify English VPCs (or &quot;phrasal verbs&quot; in their parlance) using hand-coded regular expressions. Baldwin and Villavicencio (2002) extract a simple list of VPCs from corpus data, while Baldwin (2005) extracts VPCs with valence information under the umbrella of deep lexical acquisition.1 The method of Baldwin (2005) is aimed at VPC extraction and takes into account only the syntactic features of verbs. In this paper, our interest is in VPC identification, and we make use of deeper semantic information.</Paragraph> <Paragraph position="6"> In Fraser (1976) and Villavicencio (2006) it is argued that the semantic properties of verbs can determine the likelihood of their occurrence with particles. Bannard et al. (2003) and McCarthy et al. (2003) investigate methods for estimating the compositionality of VPCs based largely on distributional similarity of the head verb and VPC.</Paragraph> <Paragraph position="7"> O'Hara and Wiebe (2003) propose a method for disambiguating the verb sense of verb-PPs. While our interest is in VPC identification--a fundamentally syntactic task--we draw on the shallow semantic processing employed in these methods in modelling the semantics of VPCs relative to their base verbs.</Paragraph> <Paragraph position="8"> The contribution of this paper is to combine syntactic and semantic features in the task of VPC identification. The basic intuition behind the proposed method is that the selectional preferences of VPCs over predefined argument positions,2 should provide insight into whether a verb and preposition in a given sentential context combine to form a VPC (e.g. Kim handed in the paper) or alternatively constitute a verb-PP (e.g. Kim walked in the room). That is, we seek to identify individual preposition token instances as intransitive prepositions (i.e. prepositional particles) or transitive particles based on analysis of the governing verb.</Paragraph> <Paragraph position="9"> The remainder of the paper is structured as follows. Section 2 outlines the linguistic features of verbs and their co-occuring nouns. Section 3 provides a detailed description of our technique. Section 4 describes the data properties and the identification method. Section 5 contains detailed evaluation of the proposed method. Section 6 discusses the effectiveness of our approach. Finally, Section 7 summarizes the paper and outlines future work.</Paragraph> </Section> class="xml-element"></Paper>