XML Viewer - w03-1808

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1808_metho.xml
Size: 20,264 bytes
Last Modified: 2025-10-06 14:08:34
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1808">
  <Title>Verb-Particle Constructions and Lexical Resources</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Characterizing VPCs
</SectionTitle>
    <Paragraph position="0"> VPCs are combinations of verbs and prepositional or adverbial particles, such as break down in The old truck broke down. In these constructions particles are characterised by containing features of motion-through-location and of completion or result in their core meaning (Bolinger, 1971). In syntactic terms in the example above we have an intransitive VPC, where no other verbal complement is required.</Paragraph>
    <Paragraph position="1"> Other VPCs may have further subcategorisation requirements, and in, for example, They came across an old manuscript we have a transitive VPC which has a further NP complement.</Paragraph>
    <Paragraph position="2"> In this work we are looking exclusively at cases of VPCs, thus excluding prepositional verbs, where a verb subcategorises for a prepositional phrase (PP), such as rely on, in He relies on his wife for everything. Cases like this and others of adverbial modi cation need to be distinguished from VPCs. This difference may be quite subtle and, in order to distinguish VPCs from other constructions we use the following criteria: a0 The particle may come either before or after the NP in transitive VPCs (e.g. He backed up the team vs He backed the team up). Whether a particle can be separated or not from the verb may depend on the degree of bondage of the particle with the verb, on the size of the NP, and on the kind of NP.</Paragraph>
    <Paragraph position="3"> a0 In transitive VPCs unstressed personal pronouns must precede the particle (e.g. They ate it up but not *They ate up it).</Paragraph>
    <Paragraph position="4"> a0 The particle, in transitive VPCs, comes before a simple de nite NP without taking it as its object (e.g. He brought along his girlfriend but not It consists of two parts).</Paragraph>
    <Paragraph position="5"> a0 In VPCs subcategorising for other verbal complements, like PPs and sentential complements, the particle must come immediately after the verb.</Paragraph>
    <Paragraph position="6"> a0 Verbs that subcategorise for an optional goal argument that is full lled by a locative or directional particle are considered to be VPCs with the particle further specifying the meaning of the verb (e.g. walk up in Bill walked up the hill).</Paragraph>
    <Paragraph position="7"> As discussed by Bolinger (1971), many of the criteria proposed for diagnosing VPCs give different results for the same combination frequently including unwanted combinations and excluding genuine VPCs. Nonetheless, they provide us with at least the basis for this decision.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Dictionaries and VPCs
</SectionTitle>
    <Paragraph position="0"> Dictionaries are a major source of information about VPCs. In Table 1 we can see the coverage of phrasal verbs (PVs) in several dictionaries and lexi- null cons: Collins Cobuild Dictionary of Phrasal Verbs (Collins-PV), Cambridge International Dictionary of Phrasal Verbs (CIDE-PV), the electronic versions of the Alvey Natural Language Tools (ANLT) lexicon (Carroll and Grover, 1989) (which was derived from the Longman Dictionary of Contemporary En null glish, LDOCE), the Comlex lexicon (Macleod and Grishman, 1998), and the LinGO English Resource Grammar (ERG) (Copestake and Flickinger, 2000) version of November 2001. This table shows in the second column the number of PV entries for each of  these dictionaries, including not only VPCs but also other kinds of PV. The third column shows the number of VPC entries (available only for the electronic dictionaries).</Paragraph>
    <Paragraph position="1">  As we can see from these numbers, each of these dictionaries has a considerable number of PV entries potentially providing us with a good starting point for handling VPCs. Table 2 shows some of the characteristics of each dictionary, in more detail, with respect to VPCs, where the seventh column shows the proportion of verbs used in VPCs (sixth column) from all verbs in a dictionary (second column). Each of these dictionaries uses a different set of verbs and particles in its VPCs. However, with respect to the verbs listed in these dictionaries there is a high level of agreement among them with, for example, 93.26% of the verbs in Comlex being also listed in ANLT. In Table 2 we can see the increase in the number of verbs obtained by the union of the dictionaries, where A+C represents the union of ANLT and Comlex, Aa1 C their intersection and A+C+E the union of ANLT, Comlex and ERG. Because of the high level of agreement for their verbs, when joined together the contribution made by each dictionary is relatively small, so that the combination of the three (A+C+E) has only 7.3% more verbs than the ANLT alone, for example.</Paragraph>
    <Paragraph position="2"> In relation to VPCs, ANLT uses the largest number of particles, and with one exception all the particles contained in the ERG and Comlex are already contained in ANLT. When we rank the particles according to the frequency with which they occur in the VPCs, we get similar patterns for all of the dictionaries, as can be seen in Figure 1. This gure shows the 5 top ranked particles for each of the dictionaries, and for all of them up is the particle involved in the largest number of combinations. By analysing the VPCs in each of these dictionaries, we can also see that only a small proportion of the total number of verbs in a dictionary is used in its VPCs, Table 2. For example, only 20% of the verbs listed in ANLT form at least one VPC. For the other dictionaries this proportion is even lower. These tend to be very widely used and general verbs, such as come, go, get, put, bring and take. Whether the remaining verbs do not form valid VPCs or whether the combinations were simply omitted remains to be investigated.</Paragraph>
    <Paragraph position="3"> Even though only a subset of verbs in dictionaries are used in VPCs, this subset generates a large number of combinations, as shown in Table 2. Each of these dictionaries specialises in a subset of VPCs.</Paragraph>
    <Paragraph position="4"> Because of this difference in coverage, when the dictionaries are combined, as each one is added it helps to signi cantly extend the coverage of VPCs.</Paragraph>
    <Paragraph position="5"> Although there is a signi cant number of entries1 that are common among the different dictionaries, it seems to correspond only to a subset of the total number of entries each dictionary has. For instance, from the total number of entries obtained by combining ANLT and Comlex, Table 2, only 34% of the entries are listed in both dictionaries with the remaining 66% of the total number of entries being exclusive to one or the other of these dictionaries. Moreover, even with the large number of entries already obtained by combining these two dictionaries, a considerable proportion (16%) of the entries in the LinGO ERG lexicon are not listed in any of these two dictionaries (this proportion would increase if we took subcategorization etc into account).2 Most  of these are at least semi-compositional, e.g., crisp up, come together, tie on, and were probably omitted from the dictionaries for that reason,3 though some others, such as hack up, are probably recent coinages. The coverage of these resources is quite limited and possible ways of extending it are a necessity for successful NLP systems.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 VPCs in Corpora
</SectionTitle>
    <Paragraph position="0"> The use of corpora to extract verb-particle combinations can contribute to extending the coverage of dictionaries. An investigation of the automatic extraction of VPCs from corpora is described in Baldwin and Villavicencio (2002). In this section we use VPCs extracted from the British National Corpus (BNC), comparing these VPCs with those contained in the combined A+C+E-VPCs, and discussing how the former can be used to complement the coverage provided by the latter.</Paragraph>
    <Paragraph position="1"> The BNC is a 100 million word corpus containing samples of written text from a wide variety of sources, designed to represent as wide a range of modern British English as possible. Using the methods described in Baldwin and Villavicencio (2002), 8,751 VPC entries were extracted from the BNC.</Paragraph>
    <Paragraph position="2"> These entries are classi ed into intransitive and/or transitive VPCs, depending on their subcategorisation frame, and they result in 7,078 distinct VPCs. Some of these entries are not VPCs but rather noise, such as **** off, 's down, etc. After removing the most obvious cases of noise, there were 7,070 VPCs most of the verb-particle entries being empirically motivated by the Verbmobil corpus. It is thus probably reasonably representative of a moderate-size domain-speci c lexicon.  ings and combinations are not given for all verbs.</Paragraph>
    <Paragraph position="3"> left. These are formed by 2,542 verbs and 48 particles, as shown in Table 3.</Paragraph>
    <Paragraph position="4">  When comparing the VPCs in BNC (BNC-VPCs) with those in the combined dictionaries (A+C+E-VPCs) there are 1,149 verbs in common, corresponding to 82.1% of the verbs in the combined dictionaries. When these resources are joined together, there is a signi cant increase in the number of verbs and particles, with a total of 2,793 different verbs and 65 particles used in VPCs, Table 3. The verbs that appear in the largest number of VPCs are again general and widely used (e.g. move, come, go, get and pull). For these, the ve particles that occur in the highest number of VPCs are shown in Figure 2, and they are basically the same as those in the dictionaries. null In terms of the VPCs, by joining A+C+E-VPCs with BNC-VPCs there is an increase of 160.30% in the number of VPCs. Among the extracted VPCs many form productive combinations: some containing a more informal or a recent use of verbs (e.g. hop off, kangaroo down and skateboard away). These VPCs provide a useful addition to those contained in the dictionaries. However, we are still able to ob- null tain only a subset of the existing VPCs, and plausible combinations such as hoover up are not found in these combined resources. In the next section we discuss how to extend even further their coverage by making use of productive patterns found in classes of semantically related verbs.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 VPC Patterns in Levin's Verb Classes
</SectionTitle>
    <Paragraph position="0"> Fraser (1976) noted how semantic properties of verbs can affect their possibilities of combination with particles (e.g. hunt/track/trail/follow down and bake/cook/fry/broil up). Semantic properties of verbs can in uence the patterns of combination that they follow (e.g. verbs of hunting and the resultative down and verbs of cooking and the aspectual up). By having a semantic classi cation of verbs we can determine how they combine with certain particles, and this can be used to extend the coverage of the available resources by productively generating VPCs from classes of related verbs according to the patterns that they follow. One such classi cation was proposed by Levin (1993). In Levin's classi cation, verbs are grouped into classes in terms of their syntactic and semantic properties. These classes were not developed speci cally for VPCs, but it may be the case that some productive patterns of combinations correspond to certain classes of verbs. We investigated the possibility of using Levin's classes of verbs to generate a set of candidate VPCs, and in this section, we brie y discuss Levin's classes and describe how they can be used to predict productive verb-particle combinations.</Paragraph>
    <Paragraph position="1"> There are 190 ne grained subclasses that capture 3,100 different verbs listed, resulting in 4,167 entries, since each verb can belong to more than one class. For example, the verb to run belongs to classes 26.3 (Verbs of Preparing), 47.5.1 (Swarm Verbs), 47.7 (Meander Verbs) and 51.3.2 (Run Verbs). The number of elements in each class varies considerably, so that 60% of all of these classes have more than 10 elements, accounting for 88% of the verbs, while the other 40% of the classes have 10 or less elements, capturing the remaining 22% of the verbs. The 5 larger classes are shown in Table 4.</Paragraph>
    <Paragraph position="2">  It is possible that some productive patterns found in VPCs may be mapped onto the classes de ned.</Paragraph>
    <Paragraph position="3"> In this case, some classes may be good predictors of productive VPCs, and to test this possibility we analysed the combinations generated by Levin's classes and a subset of four particles (down, in, out, up). To test the validity of a resulting combination, we searched for it rst among the VPCs from the combined dictionaries, A+C+E-VPCs, and then among the much more numerous but potentially noisy A+C+E+BNC-VPCs.</Paragraph>
    <Paragraph position="4"> All combinations of verbs in Levin's classes and these four particles were generated and tested for validity. We use the proportion of valid VPCs as a metric to determine the degree of productivity of a given class, so that the higher the proportion, the more productive the class, according to the combined resources. The classes are then ranked according to their productivity degree.</Paragraph>
    <Paragraph position="5"> There are 16,668 possible combinations that can be generated, from the 4,167 entries in Levin's classes and four particles. However, from the 4,167 only 3,914 entries have verbs that are in A+C+E, so we will consider only 15,656 possible VPCs, when evaluating these results against the combined dictionaries. null When we compare the 15,656 possible VPCs with those in A+C+E, 2,456 were considered valid  (15.69%). In Figure 3, we can see the degree of productivity of a class, for the 10 highest ranked classes, according to A+C+E-VPCs. From these classes, we can see two basic patterns: a3 verbs that can form aspectual combinations, with the particle giving a sense of completion and/or increase/improvement to the action denoted by the verb, e.g. verbs of Eating (39.1) and Splitting (23.2), a3 verbs that imply some motion or take a location, e.g. verbs of Bring and Take (11.3), Push and Pull (12) and Putting in spatial con guration (9.2), and can form resultative combinations. null However, apart from class 11.3, where all verbs form good combinations with all four particles, according to the dictionaries, the other classes have a lower proportion of valid combinations. As these results may be due to the coverage of the dictionaries, we compared these results with those obtained by also using BNC-VPCs to test the validity of a combination. In this case, from the 4,167 entries in Levin's classi cation, 3,925 have verbs that are in A+C+E+BNC-VPCs, generating 15,700 candidate VPCs, against which we perform the evaluation. Using this larger set of VPCs, further combinations are considered valid: 4,733 VPCs out of 15,700 candidates (30.15%). This time a considerable improvement in the results was veri ed, with a larger number of classes having the majority of its VPCs being considered valid. Figure 4 shows the ten top ranked classes found with A+C+E+BNC-VPCs. Con rming the trends suggested with the dictionaries, most of the top ranked classes have verbs implying some kind of motion or taking a location (e.g. 11.3 - Bring and Take- and 53.2 - Rushing) forming resultative VPCs, or forming aspectual VPCs (e.g. 23.2 - Split).</Paragraph>
    <Paragraph position="6"> All of the classes in Figure 4 have 70% or more of their verbs forming valid combinations, according to A+C+E+BNC-VPCs. For these classes a manual analysis of the VPCs generated was performed to test the predicted productivity of the class. All those combinations that were not attested were sub-ject to human judgement. Cases of these are: a3 catapult down/up - e.g. More victories followed including a hard-fought points win over Lizo Matayi which should have catapulted him up for a national title challenge, a3 split/tear in - e.g. The end of the square stick was then split in for a few inches.</Paragraph>
    <Paragraph position="7"> where all examples are from Google. This analysis revealed that all of the candidate VPCs in these classes are valid, which comes as a con rmation of the degree of productivity of these high ranked classes.</Paragraph>
    <Paragraph position="8"> The classes that have a degree of productivity of 40% or more form 4,344 candidate VPCs, which when joined together with the combined resources obtain a total of 9,919 VPCs. This represents an increase of 20.74% in the coverage of A+C+E+BNC-VPCs, by making use of productive patterns found in VPCs.</Paragraph>
    <Paragraph position="9"> As each of these particles occurs with a certain proportion of the verbs in a class, and this proportion varies considerably from class to class, and from particle to particle, further investigation was conducted to see the degree of productivity of individual class-particle pairs. The degree of productivity of each class-particle pair is determined by the proportion of verbs in that class that form valid combinations with that particle. Moreover, the larger the number of classes where the majority of verbs form valid VPCs with that particle, the more productive the particle is. Table 5 shows for each particle, the 5 classes that had the higher proportion of valid VPCs with that particle, according to A+C+E+BNC-VPCs. From these particles, the one that is involved in the larger number of combinations throughout more classes is up, which occurs with 40% or more of the verbs in a class for 54.7% of the  classes, and it is followed closely by out, as shown in Table 6. Thus up is the best predictor of valid verb-particle combinations, for most classes. On the other hand, the weakest predictor of valid combinations is in, which occurs in only a few classes, with 40% or more of the verbs. Class 11.3 is the best class predictor, allowing all verbs to combine with all particles.</Paragraph>
    <Paragraph position="10"> The classes that have a degree of productivity of 40% or more with a given particle using this more speci c measure, generate 4,719 VPCs, and these were used to extend the coverage of these resources obtaining a total of 9,896 VPCs. This represents an increase of 20.46% in the coverage of A+C+E+BNC-VPCs, by making use of productive patterns found in VPCs, and a very restricted set of particles.</Paragraph>
    <Paragraph position="11">  These results suggest that patterns of productivity of VPCs can be mapped into Levin's classes. Whether choosing the more productive classes over-all or the more productive class-particle pair the result is a signi cant increase in coverage of the lexical resources, when VPCs are generated from these classes. More investigation is needed to verify whether the unattested combinations, specially in the lower ranked classes are invalid or simply did not occur in the dictionaries or in the corpus, because the problem of data sparseness is especially accute for VPCs. Moreover, it is also necessary to determine the precise semantics of these VPCs, even though we expect that the more productive classes generate VPCs compositionally, combining the semantics of the verb and particle together. Possible alternatives for dealing with this issue are discussed by both Bannard et al. (2003) and McCarthy et al. (2003). Furthermore, although there are some cases where it appears reasonable to treat VPCs as fully productive, there are also cases of semi-productivity (e.g. verbs denoting cooking processes and aspectual up: boil up and heat up, but not ?saut*e up), as discussed by Villavicencio and Copestake (2002), so it is important to determine which classes are fully productive and which are not.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML