XML Viewer - p98-2234

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2234_metho.xml
Size: 14,364 bytes
Last Modified: 2025-10-06 14:15:08
<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2234">
  <Title>Some Properties of Preposition and Subordinate Conjunction Attachments*</Title>
  <Section position="3" start_page="1436" end_page="1436" type="metho">
    <SectionTitle>
2 Syntactic Considerations
</SectionTitle>
    <Paragraph position="0"> Our outlook on the attachment problem is influenced by our approach to syntax, which simplifies the traditional parsing problem in several way s . As with many approaches to processing unrestricted text, we do not attempt as a primary goal to derive spanning sentential parses. Instead, we approximate spanning parses through successive stages of partial parsing. For the purpose of the present paper, we need to mostly be concerned with the level of analysis of core noun phrases and verb phrases.</Paragraph>
    <Paragraph position="1"> By core phrases, we mean the kind of non-recursive simplifications of the NP and VP that in the literature go by names such as noun/verb groups (Appelt et al., 1993) or chunks, and base NPs (Ramshaw and Marcus, 1995).</Paragraph>
    <Paragraph position="2"> The common thread between these approaches and ours is to approximate full noun phrases or verb phrases by only parsing their non-recursive core, and thus not attaching modifiers or arguments. For English noun phrases, this amounts to roughly the span between the determiner and the head noun; for English verb phrases, the span runs roughly from the auxiliary to the head verb. We call such simplified syntactic categories groups, and consider in particular noun, verb, adverb and adjective groups.</Paragraph>
    <Paragraph position="3"> For noun groups in particular, the definition we have adopted also includes a limited number of constructs that encompass some depth-bounded recursion. For example, we also include in the scope of the noun group such complex determiners as partitives (&amp;quot;five of the suspects&amp;quot;) and possessives (&amp;quot;John's book&amp;quot;). These constructs fall under the scope of our noun group model because they are easy to parse with simple finite-state cascades, and because they more intuitively match the notion of a core phrase than do their individual components.</Paragraph>
    <Paragraph position="4"> Our model of noun groups also includes an extension of the so-called named entities familiar to the information extraction community (Def, 1995). These consist of names of persons and organizations, location names, titles, dates, times, and various numeric expressions (such as money terms). Note in particular that titles and organization names often include embedded prepositional phrases (e.g., &amp;quot;Chief of Staff&amp;quot;). For such cases, as well as for partitives, we consider these embedded prepositional phrases to be within the noun group's scope, and as such are excluded from consideration as attachment problems. Also excluded are the auxiliary to's in verb groups for infinitives.</Paragraph>
    <Paragraph position="5"> Once again, distinguishing syntax groups from traditional syntactic phrases (such as NPs) is of interest because it singles out what is usually thought of as easy to parse, and allows that piece of the parsing problem to be addressed by such comparatively simple means as finite-state machines or transformation sequences. What is then left of the parsing problem is the difficult stuff: namely the attachment of prepositional phrases, relative clauses, and other constructs that serve in modificational, adjunctive, or argument-passing roles. This part of the problem is harder both because of the ambiguous attachment location, and because the right combination of knowledge required to reduce this ambiguity is elusive.</Paragraph>
  </Section>
  <Section position="4" start_page="1436" end_page="1437" type="metho">
    <SectionTitle>
3 The Attachment Problem
</SectionTitle>
    <Paragraph position="0"> Given these syntactic preliminaries, we can now define attachment problems in terms of syntax groups. In addition to noun, verb, adjective and adverb groups, we also have I-groups.</Paragraph>
    <Paragraph position="1"> An I-group is a preposition (including multiple word prepositions) or subordinate conjunction (including wh-words and &amp;quot;that&amp;quot;). Once again prepositions that are embedded in such constructs as titles and names are not considered I-groups for our purposes. Each I-group in a sen- null tence is viewed as attaching to one other group within that sentence. 1 For example, the sentence &amp;quot;I had sent a cup to her.&amp;quot; is viewed as \[I\]ng \[had sent\]vg,~ \[a cup\]ng \[tO\]lg,~, \[her\]ng. where ~ indicates the attaching I-group and ,~ indicates the group attached to.</Paragraph>
    <Paragraph position="2"> Generally, coordinations of groups (e.g., dogs and cats) are left as separate groups. However, prenominal coordination (e.g. dog and cat food) is deemed as one large noun group.</Paragraph>
    <Paragraph position="3"> Attachments not to try: Our system is designed to attach each I-group in a sentence to one other group in the sentence on that I-group's left. In our sample data, about 11% of the I-groups have no left ambiguity (either no group on the left to attach to or only 1 group).</Paragraph>
    <Paragraph position="4"> A few (less than 0.5%) of the I-groups have no group to its right. All of these I-groups count as attachments not handled by our system and our system does not attempt to resolve them.</Paragraph>
    <Paragraph position="5"> Attachments to try: The rest of the I-groups each have at least 2 groups on their left and 1 group on their right from the I-group's sentence, and these are the I-groups that our system tries to handle (89% of all the problems in the data).</Paragraph>
  </Section>
  <Section position="5" start_page="1437" end_page="1438" type="metho">
    <SectionTitle>
4 Properties of Attachments to Try
</SectionTitle>
    <Paragraph position="0"> In order to understand how our technique handles the attachments that follow this pattern, it is helpful to consider the properties of this class of attachments. What we detail here is a specific analysis of our test data (called 7x9x). Our training sample is similar.</Paragraph>
    <Paragraph position="1"> In 7x9x, 2.4% of the attachments turn out to be of a form that guarantees our system will fail to resolve them. 83% of these unresolvable &amp;quot;attachments&amp;quot; are about evenly divided between right attachments and left attachments to a coordination of groups (which in our framework is split into 2 or more groups). A right attachment example is that &amp;quot;at&amp;quot; attaches to &amp;quot;lost&amp;quot; in &amp;quot;that at home, they lost a key.&amp;quot; A coordination attachment example is &amp;quot;with&amp;quot; attaching to the coordination &amp;quot;cats and dogs&amp;quot; in &amp;quot;cats and dogs with tags&amp;quot;. The other 17% were either lexemes erroneously tagged as prepositions/subordinate conjunctions or past participles, or were wh-words that are actually part 1Sentential level attachments are deemed to be to the main verb in the sentence attached to.</Paragraph>
    <Paragraph position="2"> of a question (and not acting as a subordinate conjunction).</Paragraph>
    <Paragraph position="3"> In 7x9x, 67.7% of attachments are to the adjacent group on the I-group's immediate left.</Paragraph>
    <Paragraph position="4"> Our system uses as a starting point the guess that all attachments are to the adjacent group.</Paragraph>
    <Paragraph position="5"> The second most likely attachment point is the nearest verb group to the I-group's left. A surprising 90.3% of the attachments are to either this verb group or to the adjacent group. 2 In our experiments, limiting the choice of possible attachment points to these two tended to improve the results and also increased the training speed, the latter often by a factor of 3 to 4. Neither of these percentages include attachments to coordinations of groups on the left, which are unhandleable. Including these attachments would add ,,~1% to each figure.</Paragraph>
    <Paragraph position="6"> The attachments can be divided into six categories, based on the contents of the I-group being attached and the types of groups surrounding that I-group. The categories are: vnpn The I-group contains a preposition. Next to the preposition on both the left and the right are noun groups. Next to the left noun group is a verb group. A member of this category is the \[to\]~g in the sentence &amp;quot;\[I\],~g \[had sent\]~g \[a cup\]ng \[tO\]/g \[her\]ng.&amp;quot; vnpfi Like vnpn, but next to the preposition on the right is not a noun group.</Paragraph>
    <Paragraph position="7"> ~npn Like vnpn, but the left neighbor of the left noun group is not a verb group.</Paragraph>
    <Paragraph position="8"> ~C/npfi Another variation on vnpn.</Paragraph>
    <Paragraph position="9"> xfipx The I-group contains a preposition. But its left neighbor is not a noun group. The x's stand for groups that need to exist, but can be of any type.</Paragraph>
    <Paragraph position="10"> xxsx The I-group has a subordinate conjunction (e.g. which) instead of a preposition. 3 Table 1 shows how likely the attachments in  data set used in (Merlo et al., 1997).</Paragraph>
    <Paragraph position="11"> aA word is deemed a preposition if it is among the 66 prepositions listed in Section 6.2's It data set. Unlisted words are deemed subordinate conjunctions.</Paragraph>
    <Paragraph position="12">  * to attach to either the left adjacent group or the nearest verb group on the left (V-A) * to have an attachment that our system actually cannot correctly handle (Err).</Paragraph>
    <Paragraph position="13"> The table also gives the percentage of the attachments in 7x9x that belong in each category (Prevalence). The A and V-A columns do not include attachments to coordinations of groups.</Paragraph>
    <Paragraph position="14">  Much of the corpus-based work on attaching prepositions (Ratnaparkhi et al., 1994; Brill and Resnik, 1994; Collins and Brooks, 1995) has dealt with the subset of category vnpn problems where the preposition actually attaches to either the nearest verb or noun group on the left. Some earlier work (Hindle and Rooth, 1993) also handled the subset of vnp5 category problems where the attachment is either to the nearest verb or noun group on the left.</Paragraph>
    <Paragraph position="15"> Some later work (Merlo et al., 1997) dealt with handling from 1 to 3 prepositional phrases in a sentence. The work dealt with prepositions in &amp;quot;group&amp;quot; sequences of VNP, VNPNP and VNPNPNP, where the prepositions attach to one of the mentioned noun or verb groups (as opposed to an earlier group on the left). So this work handles attachments that can be found in the vnpn, vnpn, vnpn and ~np5 categories.</Paragraph>
    <Paragraph position="16"> Still, this work handles less than an estimated 33% of our sample text's attachments. 4 4(Merlo et al., 1997) searches the Penn Treebank for data samples that they can handle. They find phrases where 78% of the items to attach belong to either the vnpn or vnp5 categories. So in Penn Treebank, they handle 1.28 times more attachments than the other work mentioned in this paper. This other work handles less than 25% of the attachments in our sample data.</Paragraph>
  </Section>
  <Section position="6" start_page="1438" end_page="1439" type="metho">
    <SectionTitle>
5 Processing Model
</SectionTitle>
    <Paragraph position="0"> Our attachment system is an extension of the rule-based system for VNPN binary prepositional phrase attachment described in (Brill and Resnik, 1994). The system uses transformation-based error-driven learning to automatically learn rules from training examples.</Paragraph>
    <Paragraph position="1"> One first runs the system on a training set, which starts by guessing that each I-group attaches to its left adjacent group. This training run moves in iterations, with each iteration producing the next rule that repairs the most remaining attachment errors in the training set.</Paragraph>
    <Paragraph position="2"> The training run ends when the next rule found repairs less than a threshold number of errors.</Paragraph>
    <Paragraph position="3"> The rules are then run in the same order on the test set (which also starts at an all adjacent attachment state) to see how well they do.</Paragraph>
    <Paragraph position="4"> The system makes its decisions based on the head (main) word of each of the groups examined. Like the original system, our system can look at the head-word itself and also all the semantic classes the head-word can belong to. The classes come from Wordnet (Miller, 1990) and consist of about 25 noun classes (e.g., person, process) and 15 verb classes (e.g., change, communication, status). As an extension, our system also looks at the word's partof-speech, possible stem(s) and possible subcategorization/complement categories. The latter consist of over 100 categories for nouns, adjectives and verbs (mainly the latter) from Comlex (Wolff et al., 1995). Example categories include intransitive verbs and verbs that take 2 prepositional phrases as a complement (e.g., fly in &amp;quot;I fly from here to there.&amp;quot;). In addition, Comlex gives our system the possible prepositions (e.g.</Paragraph>
    <Paragraph position="5"> from and to for the verb fly) and particles used in the possible subcategorizations.</Paragraph>
    <Paragraph position="6"> The original system chose between two possible attachment points, a verb and a noun. Each rule either attempted to move left (attach to the verb) or move right (attach to the noun).</Paragraph>
    <Paragraph position="7"> Our extensions include as possible attachment points every group that precedes the attaching I-group and is in the I-group's sentence. The rules now can move the attachment either left or right from the current guess to the nearest group that matches the rule's constraints.</Paragraph>
    <Paragraph position="8"> In addition to running the training and test with ALL possible attachment points (every  preceding group) available, one can also restrict the possible attachment points to only the group Adjacent to the I-group and the nearest Verb group on the left, if any (V-A). One uses the same attachment choice (ALL versus V-A) in the training run and corresponding test run.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML