File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-2001_intro.xml

Size: 5,441 bytes

Last Modified: 2025-10-06 14:01:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-2001">
  <Title>Extracting the Unextractable: A Case Study on Verb-particles</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Distinguishing Features of VPCs
</SectionTitle>
    <Paragraph position="0"> Here, we review a number of features of VPCs pertinent to the extraction task. First, we describe linguistic qualities that characterise VPCs, and second we analyse the actual occurrence of VPCs in the WSJ.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Linguistic features
</SectionTitle>
      <Paragraph position="0"> Given an arbitrary verb{preposition pair, where the preposition is governed by the verb, a number of analyses are possible. If the preposition is intransitive, a VPC (either intransitive or transitive) results. If the preposition is transitive, it must select for an NP, producing either a prepositional verb (e.g. refer to) or a free verb{preposition combination (e.g. put it on the table, climb up the ladder).</Paragraph>
      <Paragraph position="1"> A number of diagnostics can be used to distinguish VPCs from both prepositional verbs and free verb{ preposition combinations (Huddleston and Pullum, 2002):  1. transitive VPCs undergo the particle alternation null 2. with transitive VPCs, pronominal objects must be expressed in the \split&amp;quot; conflguration 3. manner adverbs cannot occur between the verb and particle  The flrst two diagnostics are restricted to transitive VPCs, while the third applies to both intransitive and transitive VPCs.</Paragraph>
      <Paragraph position="2"> The flrst diagnostic is the canonical test for particlehood, and states that transitive VPCs take two word orders: the joined conflguration whereby the verb and particle are adjacent and the NP complement follows the particle (e.g. hand in the paper), and the split conflguration whereby the NP complement occurs between the verb and particle (e.g. hand the paper in). Note that prepositional verbs and free verb{preposition combinations can occur only in the joined conflguration (e.g. refer to the book vs. *refer the book to). Therefore, the existence of a verb{preposition pair in the split conflguration is su-cient evidence for a VPC analysis. It is important to realise that compatibility with the particle alternation is a su-cient but not necessary condition on verb{particlehood. That is, a small number of VPCs do not readily occur in the split conflguration, including carry out (a threat) (cf. ?carry a threat out).</Paragraph>
      <Paragraph position="3"> The second diagnostic stipulates that pronominal NPs can occur only in the split conflguration (hand it in vs. *hand in it). Note also that heavy NPs tend to occur in the joined conflguration, and that various other factors interact to determine which conflguration a given VPC in context will occur in (see, e.g., Gries (2000)).</Paragraph>
      <Paragraph position="4"> The third diagnostic states that manner adverbs cannot intercede between the verb and particle (e.g. *hand quickly the paper in). Note that this constraint is restricted to manner adverbs, and that there is a small set of adverbs which can pre-modify particles and hence occur between the verb and particle (e.g. well in jump well up).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Corpus occurrence
</SectionTitle>
      <Paragraph position="0"> In order to get a feel for the relative frequency of VPCs in the corpus targeted for extraction, namely  the WSJ section of the Penn Treebank, we took a random sample of 200 VPCs from the Alvey Natural Language Tools grammar (Grover et al., 1993) and did a manual corpus search for each. In the case that a VPC was found attested in the WSJ, we made a note of the frequency of occurrence as: (a) an intransitive VPC, (b) a transitive VPC in the joined conflguration, and (c) a transitive VPC in the split conflguration. Of the 200 VPCs, only 62 were attested in the Wall Street Journal corpus (WSJ), at a mean token frequency of 5.1 and median token frequency of 2 (frequencies totalled over all 3 usages). Figure 1 indicates the relative proportion of the 62 attested VPC types which occur with the indicated frequencies. From this, it is apparent that two-thirds of VPCs occur at most three times in the overall corpus, meaning that any extraction method must be able to handle extremely sparse data.</Paragraph>
      <Paragraph position="1"> Of the 62 attested VPCs, 29 have intransitive usages and 45 have transitive usages. Of the 45 attested transitive VPCs, 12 occur in both the joined and split conflgurations and can hence be unambiguously identifled as VPCs based on the flrst diagnostic from above. For the remaining 33 transitive VPCs, we have only the joined usage, and must flnd some alternate means of ruling out a prepositional verb or free verb{preposition combination analysis. Note that for the split VPCs, the mean number of words occurring between the verb and particle was 1.6 and the maximum 3.</Paragraph>
      <Paragraph position="2"> In the evaluation of the various extraction techniques below, recall is determined relative to this limited set of 62 VPCs attested in the WSJ. That is, recall is an indication of the proportion of the 62 VPCs contained within the set of extracted VPCs.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML