File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1652_intro.xml

Size: 12,503 bytes

Last Modified: 2025-10-06 14:03:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1652">
  <Title>Feature Subsumption for Opinion Analysis</Title>
  <Section position="4" start_page="440" end_page="443" type="intro">
    <SectionTitle>
2 The Subsumption Hierarchy
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="440" end_page="441" type="sub_section">
      <SectionTitle>
2.1 Text Representations
</SectionTitle>
      <Paragraph position="0"> We analyze two feature representations that have been used for opinion analysis: Ngrams and Extraction Patterns. Information extraction (IE) patterns are lexico-syntactic patterns that represent expressions which identify role relationships. For example, the pattern &lt;subj&gt; ActVP(recommended) extracts the subject of active-voice instances of the verb recommended as the recommender. The pattern &lt;subj&gt; PassVP(recommended) extracts the subject of passive-voice instances of recommended as the object being recommended.</Paragraph>
      <Paragraph position="1"> (Riloff and Wiebe, 2003) explored the idea of using extraction patterns to represent more complex subjective expressions that have non-compositional meanings. For example, the expression drive (someone) up the wall expresses the feeling of being annoyed, but the meanings of the words drive , up , and wall have no emotional connotations individually. Furthermore, this expression is not a xed word sequence that can be adequately modeled by Ngrams. Any noun phrase can appear between the words drive' and up , so a exible representation is needed to capture the general pattern drives &lt;NP&gt; up the wall .</Paragraph>
      <Paragraph position="2"> This example represents a general phenomenon: many expressions allow intervening noun phrases and/or modifying terms. For example: stepped on &lt;mods&gt; toes Ex: stepped on the boss' toes dealt &lt;np&gt; &lt;mods&gt; blow Ex: dealt the company a decisive blow brought &lt;np&gt; to &lt;mods&gt; knees Ex: brought the man to his knees (Riloff and Wiebe, 2003) also showed that syntactic variations of the same verb phrase can behave very differently. For example, they found that passive-voice constructions of the verb ask had a 100% correlation with opinion sentences, but active-voice constructions had only a 63% correlation with opinions.</Paragraph>
      <Paragraph position="3">  Our goal is to use the subsumption hierarchy to identify Ngram and extraction pattern features that are more strongly associated with opinions than simpler features. We used three types of features in our research: unigrams, bigrams, and IE patterns. The Ngram features were generated using the Ngram Statistics Package (NSP) (Banerjee and Pedersen, 2003).1 The extraction patterns (EPs) were automatically generated using the Sundance/AutoSlog software package (Riloff and Phillips, 2004). AutoSlog relies on the Sundance shallow parser and can be applied exhaustively to a text corpus to generate IE patterns that can extract every noun phrase in the corpus. AutoSlog has been used to learn IE patterns for the domains of terrorism, joint ventures, and microelectronics (Riloff, 1996), as well as for opinion analysis (Riloff and Wiebe, 2003). Figure 1 shows the 17 types of extraction patterns that AutoSlog generates. PassVP refers to passive-voice verb phrases (VPs), ActVP refers to active-voice VPs, InfVP refers to in nitive VPs, and AuxVP refers  that consisted entirely of stopwords. We used a list of 281 stopwords.</Paragraph>
      <Paragraph position="4">  to VPs where the main verb is a form of to be or to have . Subjects (subj), direct objects (dobj), PP objects (np), and possessives can be extracted by the patterns.2</Paragraph>
    </Section>
    <Section position="2" start_page="441" end_page="443" type="sub_section">
      <SectionTitle>
2.2 The Subsumption Hierarchy
</SectionTitle>
      <Paragraph position="0"> We created a subsumption hierarchy that de nes the representational scope of different types of features. We will say that feature A representationally subsumes feature B if the set of text spans that match feature A is a superset of the set of text spans that match feature B. For example, the uni-gram happy subsumes the bigram very happy because the set of text spans that match happy includes the text spans that match very happy .</Paragraph>
      <Paragraph position="1"> First, we de ne a hierarchy of valid subsumption relationships, shown in Figure 2. The 2Gram node, for example, is a child of the 1Gram node because a 1Gram can subsume a 2Gram. Ngrams may subsume extraction patterns as well. Every extraction pattern has at least one corresponding 1Gram that will subsume it.3. For example, the 1Gram recommended subsumes the pattern &lt;subj&gt; ActVP(recommended) because the pattern only matches active-voice instances of recommended . An extraction pattern may also subsume another extraction pattern. For example, &lt;subj&gt; ActVP(recommended) subsumes</Paragraph>
      <Paragraph position="3"> To compare speci c features we need to formally de ne the representation of each type of feature in the hierarchy. For example, the hierarchy dictates that a 2Gram can subsume the pattern ActInfVP &lt;dobj&gt; , but this should hold only if the words in the bigram correspond to adjacent words in the pattern. For example, the 2Gram to sh subsumes the pattern ActInfVP(like to sh) &lt;dobj&gt; . But the 2Gram like sh should not subsume it. Similarly, consider the pattern InfVP(plan) &lt;dobj&gt; , which represents the in nitive to plan . This pattern subsumes the pattern ActInfVP(want to plan) &lt;dobj&gt; , but it should not subsume the pattern ActInfVP(plan to start) .</Paragraph>
      <Paragraph position="4"> To ensure that different features truly subsume each other representationally, we formally de ne each type of feature based on words, sequential 2However, the items extracted by the patterns are not actually used by our opinion classi ers; only the patterns themselves are matched against the text.</Paragraph>
      <Paragraph position="5"> 3Because every type of extraction pattern shown in Figure 1 contains at least one word (not including the extracted phrases, which are not used as part of our feature representation). null dependencies, and syntactic dependencies. A sequential dependency between words wi and wi+1 means that wi and wi+1 must be adjacent, and that wi must precede wi+1. Figure 3 shows the formal de nition of a bigram (2Gram) node. The bigram is de ned as two words with a sequential dependency indicating that they must be adjacent.</Paragraph>
      <Paragraph position="7"> A syntactic dependency between words wi and wi+1 means that wi has a speci c syntactic relationship to wi+1, and wi must precede wi+1. For example, consider the extraction pattern NP Prep &lt;np&gt; , in which the object of the preposition attaches to the NP. Figure 4 shows the de nition of this extraction pattern in the hierarchy. The pattern itself contains three components: the NP, the attaching preposition, and the object of the preposition (which is the NP that the pattern extracts). The de nition also includes two syntactic dependencies: the rst dependency is between the NP and the preposition (meaning that the preposition syntactically attaches to the NP), while the second dependency is between the preposition and the extraction (meaning that the extracted NP is the syntactic object of the preposition).</Paragraph>
      <Paragraph position="9"> Consequently, the bigram affair with will not subsume the extraction pattern affair with &lt;np&gt; because the bigram requires the noun and preposition to be adjacent but the pattern does not. For example, the extraction pattern matches the text an affair in his mind with Countess Olenska but the bigram does not. Conversely, the extraction pattern does not subsume the bigram either because the pattern requires syntactic attachment but the bigram does not. For example, the bigram matches  the sentence He ended the affair with a sense of relief , but the extraction pattern does not.</Paragraph>
      <Paragraph position="10"> Figure 5 shows the de nition of another extraction pattern, InfVP &lt;dobj&gt; , which includes both syntactic and sequential dependencies. This pattern would match the text to protest high taxes . The pattern de nition has three components: the in nitive to , a verb, and the direct object of the verb (which is the NP that the pattern extracts). The de nition also shows two syntactic dependencies. The rst dependency indicates that the verb syntactically attaches to the in nitive to . The second dependency indicates that the extracted NP syntactically attaches to the verb (i.e., it is the direct object of that particular verb).</Paragraph>
      <Paragraph position="11"> The pattern de nition also includes a sequential dependency, which speci es that to must be adjacent to the verb. Strictly speaking, our parser does not require them to be adjacent. For example, the parser allows intervening adverbs to split in nitives (e.g., to strongly protest high taxes ), and this does happen occasionally. But split innitives are relatively rare, so in the vast majority of cases the in nitive to will be adjacent to the verb. Consequently, we decided that a bigram (e.g., to protest ) should representationally subsume this extraction pattern because the syntactic exibility afforded by the pattern is negligible. The sequential dependency link represents this judgment call that the in nitive to and the verb are adjacent in most cases.</Paragraph>
      <Paragraph position="12"> For all of the node de nitions, we used our best judgment to make decisions of this kind. We tried to represent major distinctions between features, without getting caught up in minor differences that were likely to be negligible in practice.</Paragraph>
      <Paragraph position="14"> To use the subsumption hierarchy, we assign each feature to its appropriate node in the hierarchy based on its type. Then we perform a top-down breadth- rst traversal. Each feature is compared with the features at its ancestor nodes. If a feature's words and dependencies are a superset of an ancestor's words and dependencies, then it is subsumed by the (more general) ancestor and discarded.4 When the subsumption process is nished, a feature remains in the hierarchy only if 4The words that they have in common must also be in the same relative order.</Paragraph>
      <Paragraph position="15">  there are no features above it that subsume it.</Paragraph>
    </Section>
    <Section position="3" start_page="443" end_page="443" type="sub_section">
      <SectionTitle>
2.3 Performance-based Subsumption
</SectionTitle>
      <Paragraph position="0"> Representational subsumption is concerned with whether one feature is more general than another.</Paragraph>
      <Paragraph position="1"> But the purpose of using the subsumption hierarchy is to identify more complex features that outperform simpler ones. Applying the subsumption hierarchy to features without regard to performance would simply eliminate all features that have a more general counterpart in the feature set.</Paragraph>
      <Paragraph position="2"> For example, all bigrams would be discarded if their component unigrams were also present in the hierarchy.</Paragraph>
      <Paragraph position="3"> To estimate the quality of a feature, we use Information Gain (IG) because that has been shown to work well as a metric for feature selection (Forman, 2003). We will say that feature A behaviorally subsumes feature B if two criteria are met: (1) A representationally subsumes B, and (2) IG(A) IG(B) - d, where d is a parameter representing an acceptable margin of performance difference. For example, if d=0 then condition (2) means that feature A is just as valuable as feature B because its information gain is the same or higher. If d&gt;0 then feature A is allowed to be a little worse than feature B, but within an acceptable margin. For example, d=.0001 means that A's information gain may be up to .0001 lower than B's information gain, and that is considered to be an acceptable performance difference (i.e., A is good enough that we are comfortable discarding B in favor of the more general feature A).</Paragraph>
      <Paragraph position="4"> Note that based on the subsumption hierarchy shown in Figure 2, all 1Grams will always survive the subsumption process because they cannot be subsumed by any other types of features. Our goal is to identify complex features that are worth adding to a set of unigram features.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML