File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2106_metho.xml

Size: 18,294 bytes

Last Modified: 2025-10-06 14:10:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2106">
  <Title>Coverage and Inheritance in The Preposition Project</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The Preposition Project (TPP, Litkowski &amp; Hargraves, 2005)1 provides a large amount of data for a small number of prepositions. To date, 13 out of 373 prepositions (among the most frequent in English) have been analyzed. We examined the data for these prepositions to determine (1) their coverage of the semantic space of semantic relations, (2) the extent to which these data could be extrapolated to prepositions not yet covered, and (3) what types of analyses might be useful to fill shortcomings in the data. Examining these issues seems important to determining the extent to which the data in the project can be used in NLP applications.</Paragraph>
    <Paragraph position="1"> TPP is designed to provide a comprehensive database of preposition senses, so it is useful to provide a mechanism for assessing the extent of coverage, not only in comparison with the range of meanings described in traditional grammar, but also in comparison with analyses within the computational linguistics community. Similarly, it seems important to determine how, if at all, the data developed thus far can be leveraged for use with other preposition meanings not yet analyzed, e.g., through mechanisms of inheritance. Finally, through these analyses, it is useful to identify any shortcomings in data being developed in TPP and what further should be undertaken.</Paragraph>
    <Paragraph position="2"> In the following sections, we first provide an overview of TPP and extensions to its available data that have occurred since its inception. Next, we examine issues of coverage in relation to the range of preposition meaning contained in Quirk et al.</Paragraph>
    <Paragraph position="3"> (1985), alongside the ranges in other resources such as the Penn Treebank, FrameNet, and Lexical Conceptual Structures. This analysis also considers accounts of semantic relations that have been presented in literature that has used these other resources. Next, we critically examine claims of the inheritance of preposition meaning as described in Litkowski (2002), including consideration of inheritance mechanisms in FrameNet. This analysis suggests some mechanisms for a data-driven or corpus-based approach to the identification of a semantic relation inventory. Finally, based on these analyses of coverage and inheritance, we identify some next steps TPP needs to take.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="38" type="metho">
    <SectionTitle>
2 The Preposition Project
</SectionTitle>
    <Paragraph position="0"> The primary objective of TPP is to characterize each of 847 preposition senses for 373 prepositions (including 220 phrasal prepositions with 309 senses) with a semantic role name and the syntactic and semantic properties of its complement and attachment point. The preposition sense inventory is taken from the Oxford Dictionary of English  (2004).2 Starting from the senses for a particular preposition, a set of instances of that preposition are extracted from the FrameNet database. A lexicographer then assigns a sense from the inventory to each instance. While engaged in this sense assignment, the lexicographer accumulates an understanding of the behavior of the preposition, assigns a name to each sense (characterizing its semantic type), and characterizes the syntactic and semantic properties of the preposition complement and its point of attachment or head. Each sense is also characterized by its syntactic function and its meaning, identifying the relevant paragraph(s) where it is discussed in Quirk et al.</Paragraph>
    <Paragraph position="1"> TPP then makes available the sense analysis (including the lexicographer's overview) and the set of instances for each preposition that is analyzed. In addition, the disambiguated instances are then analyzed to provide the set of FrameNet frames and frame elements associated with each sense. The set of sentences is provided in Senseval format, along with an answer key, for use in development of preposition disambiguation routines (ranging from 300 to over 4000 sentences for 'of'). Finally, using the FrameNet frame and frame element of the tagged instances, syntactic alternation patterns (other syntactic forms in which the semantic role may be realized) are provided for each FrameNet target word; this data constitutes a suitable corpus for use in studying, for example, English verb classes (see Levin, 1993).</Paragraph>
    <Paragraph position="2"> An important next step for TPP is the use of these disambiguated instances to refine the characterization of the syntactic and semantic properties of the complement and the point of attachment. As the lexicographer has analyzed the sense inventory for a preposition, the question of its use in relation to other words is continually raised. In particular, the question is whether a sense stands alone or is selected for by a verb or other word (most frequently, an adjective).3 The lexicographer has observed that selection might be occurring. The extent to which this occurs will be examined when an attempt is made, for example, to develop decision lists for disambiguating among a preposition's senses.4 We hope, as a result, that the number of instances available for disambiguation will permit a more definitive characterization of selection.</Paragraph>
    <Paragraph position="3"> Since Litkowski &amp; Hargraves (2005), several additions have been made to the data and analyses available under TPP. First, Oxford University Press has granted permission to provide the definitions and examples of the senses for each definition from the Oxford Dictionary of English (ODE, 2003) (and its predecessor, the New Oxford Dictionary of English (NODE, 1997)). Second, a summary file of all senses has been prepared from the individual preposition sense analyses, facilitating overview analysis of the full sense inventory (e.g., sorting the table on different columns). Third, the lexicographer has disambiguated the ending preposition of definitions as those prepositions are analyzed (e.g., in sense 1 of about, on the subject of, identifying the applicable sense of of); 451 prepositions have been so tagged.</Paragraph>
    <Paragraph position="4"> At present, the following 13 prepositions have been analyzed (with the initial number of senses in parentheses): about (6), against (10), at (12), by (22), for (14), from (14), in (11), of (18), on (23), over (16), through (13), to (17), and with (16).</Paragraph>
    <Paragraph position="5"> The number of senses has changed based on changes from NODE to ODE and based on evidence developed in the project (adding 19 senses that are attested with the FrameNet data). These prepositions include the most frequent in English (see Boonthum et al., 2006 for the top 10 based on the Brown corpus). In summary, the 13 prepositions (out of 373 identified in Litkowski, 2002) have 210 senses (19 have been added during the course of TPP) out of the original 847 senses.</Paragraph>
    <Paragraph position="6"> It is noteworthy also that in moving from NODE to ODE, 60 prepositions have been removed. Some of these prepositions are variant spellings (e.g. abaht for about). Most are phrasal prepositions, e.g., to the accompaniment of. In 2TPP does not include particle senses of such words as in or over (or any other particles) used with verbs to make phrasal verbs. In this context, phrasal verbs are to be distinguished from verbs that select a preposition (such as on in rely on), which may be characterized as a collocation. We are grateful to an anonymous reviewer for raising this issue.</Paragraph>
    <Paragraph position="7">  excludes senses that are selected for. This prompted an examination of whether this might be the case.</Paragraph>
    <Paragraph position="8"> Although it is the intent that such senses be included, an examination of how FrameNet instances are generated raises the possibility that such instances may have excluded. Procedures are currently being developed to ensure that such instances are not excluded.</Paragraph>
    <Paragraph position="9">  NODE, the definitions constitute a lexicographic statement that the meaning of the phrase has an idiomatic status, i.e., is not solely recoverable based on an understanding of the meanings of its constituents. In ODE, such phrases are identified as having collocative status and thereby rendered in example usages with italics, but not given a definition. Such phrases will be retained in TPP.</Paragraph>
    <Paragraph position="10"> Litkowski &amp; Hargraves (2005) provides more details on the methodology used in TPP and the databases that are available.</Paragraph>
  </Section>
  <Section position="6" start_page="38" end_page="40" type="metho">
    <SectionTitle>
3 Semantic Coverage of TPP
</SectionTitle>
    <Paragraph position="0"> Although only a small percentage of the prepositions have as yet been analyzed, approximately 25 percent of the total number of senses are included in the 13 prepositions. This percentage is sufficient to assess their coverage of the semantic space of prepositional meaning.</Paragraph>
    <Section position="1" start_page="38" end_page="38" type="sub_section">
      <SectionTitle>
3.1 Assessing the Broad Spectrum of Semantic
Space
</SectionTitle>
      <Paragraph position="0"> To assess the coverage, the first question is what inventory should be used. The linguistics and computational linguistics literatures are replete with introspective lists of semantic roles. Gildea &amp; Jurafsky (2002) present a list of 18 that may be viewed as reasonably well-accepted. O'Hara (2005) provides several compilations based on Penn Treebank annotations, FrameNet, OpenCyc, and Factotum. Boonthum et al. (2006) includes an assessment of semantic roles in Jackendoff, Dorr's</Paragraph>
    </Section>
    <Section position="2" start_page="38" end_page="39" type="sub_section">
      <SectionTitle>
Lexical Conceptual Structures preposition
</SectionTitle>
      <Paragraph position="0"> database, and Barker's analysis of preposition meaning; she posits a list of 7 overarching semantic roles (although specifically intended for use in paraphrase analysis). Without going into a detailed analysis of each of these lists, all of which are relatively small in number, the semantic relations included in TPP clearly cover each of the lists.</Paragraph>
      <Paragraph position="1"> However, since the semantic relations in these lists are relatively coarse-grained, this assessment is not sufficient.</Paragraph>
      <Paragraph position="2"> Quirk et al. (1985) is arguably the most comprehensive introspective compilation of the range of preposition meaning. As indicated above, in analyzing the senses for a preposition, the lexicographer includes a reference to a section in Quirk et al (specifically in Chapter 9). Quirk et al. describe the meanings of prepositions in 50 sections, with the majority of discussion devoted to spatial and temporal prepositions. By comparing the references in the spreadsheets for each preposition (i.e., a data-driven approach), we find that only 4 sections are not yet mentioned. These are 9.21 (between), 9.56 (concession), 9.58 (exception and addition), and 9.59 (negative condition). In general, then, TPP broadly covers the full range of meanings expressed by prepositions as described in Quirk et al..</Paragraph>
      <Paragraph position="3"> However, for almost half of the senses analyzed in TPP (100 of 210), the lexicographer was unable to assign a Quirk paragraph in Chapter 9 or elsewhere. This raises the question of whether Quirk et al. can be viewed as comprehensive. A preliminary examination of the semantic relations assigned by the lexicographer and not assigned a Quirk paragraph indicates that the range of prepositional meaning is more extensive than what is provided in Quirk et al.</Paragraph>
      <Paragraph position="4"> Two major categories of missing semantic relations emerge from this analysis. Of the 100 senses without a Quirk paragraph, 28 involve prepositional usages pertaining to quantities. These include the semantic relations like Age (&amp;quot;at six he contracted measles&amp;quot;, ScaleValue (&amp;quot;an increase of 5%&amp;quot;), RatioDenominator (&amp;quot;ten miles to the gallon&amp;quot;), Exponent (&amp;quot;10 to the fourth power&amp;quot;), ValueBasis (&amp;quot;a tax on tea&amp;quot;), Price (&amp;quot;copies are available for $5&amp;quot;), and UnitSize (&amp;quot;billing is by the minute&amp;quot;). Another 32 involve prepositions used to establish a point of reference, similar to the Standard in Quirk (section 9.62), except indicating a much broader set. These include semantic relations like FormerState (&amp;quot;wakened from a dream&amp;quot;), KnowledgeSource (&amp;quot;information from books&amp;quot;), NameUsed (&amp;quot;call him by his last name&amp;quot;), ParentName (&amp;quot;a child by her first husband&amp;quot;), Experiencer (&amp;quot;a terrible time for us&amp;quot;), and Comparator (&amp;quot;that's nothing compared to this&amp;quot;). The remaining 40 semantic relations, such as MusicalKey (&amp;quot;in F minor&amp;quot;), Drug (&amp;quot;on dope&amp;quot;), and ProfessionAspect (&amp;quot;a job in publishing&amp;quot;), appear to represent finer-grained points of prepositional meaning.</Paragraph>
      <Paragraph position="5"> This assessment of coverage suggests that TPP currently not only covers the broad range of semantic space, but also identifies gaps that have not received adequate treatment in the linguistic literature. Perhaps such gaps may be viewed as &amp;quot;beneath the radar&amp;quot; and not warranting elaborate treatment. However, it is highly likely that these</Paragraph>
    </Section>
    <Section position="3" start_page="39" end_page="39" type="sub_section">
      <SectionTitle>
Semantic
Relation Frequency Definitions Examples
</SectionTitle>
      <Paragraph position="0"> Location 0.404 expressing location or arrival in a particular place or position crouched at the edge of the track Temporal 0.072 expressing the time when an event takes place avoid confusiong at this late stage Level 0.039 denoting a particular point or segment on a scale charged at two percent Skill 0.038 expressing a particular state or condition, or a relationship between an individual and a skill brilliant at the job ActionObject 0.276 expressing the object of a look, gesture, thought, action, or plan moaned at him Stimulus 0.171 expressing the means by which something is done or the cause of an action or reaction boiled at his lack of thought  senses occur with considerable frequency and should be treated.</Paragraph>
      <Paragraph position="1"> It is somewhat premature to perform a comprehensive analysis of coverage that provides a full characterization of the semantic space of preposition meaning based on the 25 percent of senses that have been analyzed thus far. However, the available data are sufficient to begin such an effort; this issue is further discussed below.</Paragraph>
    </Section>
    <Section position="4" start_page="39" end_page="40" type="sub_section">
      <SectionTitle>
3.2 Assessing Finer-Grained Spectra of
Prepositional Meaning
</SectionTitle>
      <Paragraph position="0"> While examining the broad coverage of preposition meaning, several issues affecting the treatment of individual prepositions in the computational linguistics literature emerged. These issues also provide a perspective on the potential value of the analyses being performed in TPP.</Paragraph>
      <Paragraph position="1"> O'Hara (2005), in attempting to create a framework for analysis and identification of semantic relations, examined the utility of Penn Treebank II annotations and FrameNet frame elements. He examined sentences containing at in both corpora. In Treebank, he noted that there were four senses: locative (0.732), temporal (0.239), manner (0.020), and direction (0.006). In FrameNet, with some combination of frame elements, he identified five major senses: addressee (0.315), other (0.092), phenomenon (0.086), goal (0.079), and content (0.051).</Paragraph>
      <Paragraph position="2"> Table 1 provides a coarse-grained analysis of at developed in TPP (6 additional subsenses are not shown). Although frequencies are shown in the table, they should not be taken seriously, since the FrameNet instances on which they are based makes no claim to be representative. In particular, FrameNet seldom annotates temporal references since they are usually viewed as peripheral frame elements that may occur with virtually all frames. Nonetheless, the frequencies in the FrameNet instances does indicate that each of the at senses is likely to occur at levels that should not be ignored or glossed over.</Paragraph>
      <Paragraph position="3"> In comparing TPP results with Penn Treebank characterizations, it seems that, not only might the corpus be unrepresentative, but that the linguistic introspection does not capture the more natural array of senses. Thus, by combining corpus evidence (from FrameNet) with a lexicographic perspective for carving out sense distinctions, an improved balance results. It should also be noted that in Table 1, the final sense for Stimulus emerged from the FrameNet data and from Quirk and was not identified in the ODE sense inventory. Comparing TPP results with O'Hara's aggregation of FrameNet frame elements indicates the difficulty of working directly with the large number of frame elements (currently over 700). As Gildea &amp; Jurafsky noted, it is difficult to map these frame elements into higher level semantic roles.</Paragraph>
      <Paragraph position="4"> Some assistance is available from the FrameNet inheritance hierarchy, but this is still not welldeveloped. This issue is taken up further below in describing how TPP's data-driven approach may facilitate this kind of mapping.</Paragraph>
      <Paragraph position="5"> In summary, the methodology being followed in TPP arguably provides a more natural and a more assuredly complete coverage of the fine-grained senses associated with an individual preposition.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML