File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-1039_metho.xml

Size: 15,836 bytes

Last Modified: 2025-10-06 14:10:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-1039">
  <Title>Multi-Document Summarization of Evaluative Text</Title>
  <Section position="3" start_page="305" end_page="306" type="metho">
    <SectionTitle>
2 Information Extraction from
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="305" end_page="306" type="sub_section">
      <SectionTitle>
Evaluative Text
2.1 Feature Extraction
</SectionTitle>
      <Paragraph position="0"> Knowledge extraction from evaluative text about a single entity is typically decomposed into three distinct phases: the determination of features of the entity evaluated in the text, the strength of each evaluation, and the polarity of each evaluation. For instance, the information extracted from the sentence &amp;quot;The menus are very easy to navigate but the user preference dialog is somewhat difficult to locate.&amp;quot; should be that the &amp;quot;menus&amp;quot; and the &amp;quot;user preference dialog&amp;quot; features are evaluated, and that the &amp;quot;menus&amp;quot; receive a very positive evaluation while the &amp;quot;user preference dialog&amp;quot; is evaluated rather negatively.</Paragraph>
      <Paragraph position="1"> For these tasks, we adopt the approach described indetail in (Carenini et al., 2005). This approach relies on the work of (Hu and Liu, 2004a) for the tasks of strength and polarity determination. For the task of feature extraction, it enhances earlier work (Hu and Liu, 2004c) by mapping the extracted features into a hierarchy of features which describes the entity of interest. Theresulting mapping reduces redundancy and provides conceptual organization of the extracted features.</Paragraph>
      <Paragraph position="2">  Before continuing, we shall describe the terminology we use when discussing the extracted knowledge. The features evaluated in a corpus of reviews and extracted by following Hu and Liu's approach are called Crude Features.</Paragraph>
      <Paragraph position="4"> For example, crude features for a digital camera might include &amp;quot;picture quality&amp;quot;, &amp;quot;viewfinder&amp;quot;, and &amp;quot;lens&amp;quot;. Each sentence sk in the corpus contains a set of evaluations (of crude features) called</Paragraph>
      <Paragraph position="6"> . Each evaluation contains both a polarity and a strength represented as an integer in the range a7a9a8 3a10a11a8 2a10a11a8 1a10a13a12 1a10a13a12 2a10a13a12 3a14 where a12 3 is the most positive possible evaluation and a8 3 is the most negative possible evaluation.</Paragraph>
      <Paragraph position="7"> There is also a hierarchical set of possibly more abstract user-defined features 1</Paragraph>
      <Paragraph position="9"> SeeFigure1for asampleUDF. Theprocess of hierarchically organizing the extracted features produces a mapping from CF to UDF features (see (Carenini et al., 2005) for details). We call the set of crude features mapped to the user-defined feature ud fi mapa5 ud fi  . For example, the crude features &amp;quot;unresponsiveness&amp;quot;, &amp;quot;delay&amp;quot;, and &amp;quot;lag time&amp;quot; would all be mapped to the ud f &amp;quot;delay between shots&amp;quot;.</Paragraph>
      <Paragraph position="10"> For each cfj, there is a set of polarity and strength evaluations psa5 cfj  corresponding to each evaluation of cfj in the corpus. We call the set of polarity/strength evaluations directly associ-</Paragraph>
      <Paragraph position="12"> The total set of polarity/strength evaluations associated with ud fi, including its descendants, is 1We call them here user-defined features for consistency with (Carenini et al., 2005). In this paper, they are not assumed to be and are not in practice defined by the user.</Paragraph>
      <Paragraph position="14"> refers to all descendants of ud fi.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="306" end_page="306" type="metho">
    <SectionTitle>
3 MEAD*: Sentence Extraction
</SectionTitle>
    <Paragraph position="0"> Most modern summarization systems use sentences extracted from the source text as the basis for summarization (see (Nat, 2005b) for a representative sample). Extraction-based approaches have the advantage of avoiding the difficult task of natural language generation, thus maintaining domain-independence because the system need not be aware of specialized vocabulary for its target domain. The main disadvantage of extraction-based approaches is the poor linguistic coherence of the extracted summaries.</Paragraph>
    <Paragraph position="1"> Because of the widespread and well-developed use of sentence extractors in summarization, we chose to develop our own sentence extractor as a first attempt at summarizing evaluative arguments. To do this, we adapted MEAD (Radev et al., 2003), an open-source framework for multi-document summarization, to suit our purposes.</Paragraph>
    <Paragraph position="2"> We refer to our adapted version of MEAD as MEAD*. The MEAD framework decomposes sentence extraction into three steps: (i) Feature Calculation: Some numerical feature(s) are calculated for each sentence, for example, a score based on document position and a score based on the TF*IDF of a sentence. (ii) Classification: The features calculated during step (i) are combined into a single numerical score for each sentence.</Paragraph>
    <Paragraph position="3"> (iii) Reranking: The numerical score for each sentence is adjusted relative to other sentences. This allows the system to avoid redundancy in the final set of sentences by lowering the score of sentences which are similar to already selected sentences.</Paragraph>
    <Paragraph position="4"> We found from early experimentation that the most informative sentences could be accurately determined byexamining theextractedCFs.</Paragraph>
    <Paragraph position="5"> Thus, we created our own sentence-level feature based onthe number, strength, and polarity ofCFs extracted for each sentence.</Paragraph>
    <Paragraph position="7"> During system development, we found this measure to be effective because it was sensitive to the number of CFs mentioned in a given sentence aswellastothestrength oftheevaluation for each CF. However, many sentences may have the same CF sum score (especially sentences which contain an evaluation for only one CF). In such cases, we used the MEAD 3.072 centroid feature as a 'tie-breaker'. The centroid is a common feature in multidocument summarization (cf. (Radev et al., 2003), (Saggion and Gaizauskas, 2004)).</Paragraph>
    <Paragraph position="8"> At the reranking stage, we adopted a different algorithm than the default in MEAD. We placed each sentence which contained an evaluation of a given CF into a 'bucket' for that CF. Because a sentence could contain more than one CF, a sentence could be placed in multiple buckets. We then selected the top-ranked sentence from each bucket, starting with the bucket containing the most sentences (largest</Paragraph>
    <Paragraph position="10"> ), never selecting the same sentence twice. Once one sentence had been selected from each bucket, the process was repeated3. This selection algorithm accomplishes two important tasks: firstly, it avoids redundancy by only selecting one sentence to represent each CF (unless all other CFs have already been represented), and secondly, it gives priority to CFs which are mentioned more frequently in the text.</Paragraph>
    <Paragraph position="11"> The sentence selection algorithm permits us to select an arbitrary number of sentences to fit a desired word length. We then ordered the sentences according to a primitive discourse planning strategy in which the most general CF (i.e. the CF mapped to the topmost node in the UDF) is discussed first. The remaining sentences were then ordered according to a depth-first traversal of the UDF hierarchy. In this way, general features are followed immediately by their more specific children in the hierarchy.</Paragraph>
  </Section>
  <Section position="5" start_page="306" end_page="308" type="metho">
    <SectionTitle>
4 SEA: Natural Language Generation
</SectionTitle>
    <Paragraph position="0"> The extraction-based approach described in the previous section has several disadvantages. We already discussed problems with the linguistic coherence of the summary, but more specific problems arise in our particular task of summarizing a corpus of evaluative text. Firstly, sentence extraction does not give the reader any explicit information about of the distribution of evaluations, for example, how many users mentioned a given fea- null ture and whether user opinions were uniform or varied. It also does not give an aggregate view of user evaluations because typically it only presents one evaluation for each CF. It may be that a very positive evaluation for oneCF wasselected for extraction, even though most evaluations were only somewhat positive and some were even negative.</Paragraph>
    <Paragraph position="1"> We thus also developed a system, SEA, that presents suchinformation ingenerated natural language. This system calculates several important characteristics of the source corpus by aggregating the extracted information including the CF to UDF mapping. We first describe these characteristics and then discuss their presentation in natural language.</Paragraph>
    <Section position="1" start_page="307" end_page="308" type="sub_section">
      <SectionTitle>
4.1 Aggregation of Extracted Information
</SectionTitle>
      <Paragraph position="0"> In order to provide an aggregate view of the evaluation expressed in a corpus of evaluative text a summarizer should at least determine: (i) which features of the evaluated entity were most 'important' to the users (ii) some aggregate of the user opinions for the important features (iii) the distribution of those opinions and (iv) the reasons behind each user opinion. We now discuss each of these aspects in detail.</Paragraph>
      <Paragraph position="1">  We approach the task of selecting the most 'important' features by defining a 'measure of importance' for each feature of the evaluated entity. We define the 'direct importance' of a feature in the</Paragraph>
      <Paragraph position="3"> where by 'direct' we mean the importance derived only from that feature and not from its children. This metric produces high scores for features which either occur frequently in the corpus or have strong evaluations (or both). This 'direct' measure of importance, however, is incomplete, as each non-leaf node in the UDF effectively serves a dual purpose. It is both a feature upon which a user might comment and a category for grouping its sub-features. Thus, a non-leaf node should be important if either its children are important or the node itself is important (or both). To this end, we have defined the total measure of importance</Paragraph>
      <Paragraph position="5"> refers to the children of ud fi in the hierarchy and a is some real parameter in the range a70a35a10 1a14 . In this measure, the importance of a node is a combination of its direct importance and of the importance of its children. The parameter a may be adjusted to vary the relative weight of the parent and children. We used a a0 0a3 9 for our experiments. This setting resulted in more informative summaries during system development.</Paragraph>
      <Paragraph position="6"> In order to perform feature selection using this metric, we must also define a selection procedure.</Paragraph>
      <Paragraph position="7"> The most obvious is a simple greedy selection sort the nodes in the UDF by the measure of importance and select the most important node until a desired number of features is included. However, because a node derives part of its 'importance' from its children, it is possible for a node's importance to be dominated by one or more of its children. Including both the child and parent node would be redundant because most of the information is contained in the child. We thus choose a dynamic greedy selection algorithm in which we recalculate the importance of each node after each round of selection, with all previously selected nodes removed from the tree. In this way, if a node that dominates its parent's importance is selected, its parent's importance will bereduced during later rounds of selection. This approach mimics the behaviour of several sentence extraction-based summarizers (e.g. (Schiffman et al., 2002; Saggion and Gaizauskas, 2004)) which define a metric for sentence importance and then greedily select the sentence which minimizes similarity with already selected sentences and maximizes informativeness. null  We approach the task of aggregating opinions from the source text in a similar fashion to determining the measure of importance. We calculate an 'orientation' for each UDF by aggregating the polarity/strength evaluations of all related CFs into a single value. We define the 'direct orientation' of a UDF as the average of the strength/polarity evaluations of all related CFs</Paragraph>
      <Paragraph position="9"> As with our measure of importance, we must also include the orientation of a feature's children in its orientation. Because a feature in the UDF conceptually groups its children, the orientation of a feature should include some information about the orientation of its children. We thus define the  Thismetric produces areal number between a8 3 and a12 3 which serves as an aggregate of user opinions for a feature. We use the same value of a as</Paragraph>
      <Paragraph position="11"> Communicating user opinions to the reader is not simply a matter of classifying each feature as being evaluated negatively or positively - the reader may also want to know if all users evaluated a feature in a similar way or if evaluations were varied. We thus also need a method of determining the modality of the distribution of user opinions. We calculate the sum of positive polarity/strength evaluations (or negative if oria5 ud fi</Paragraph>
      <Paragraph position="13"> negative) for a node and its children as a fraction of all polarity/strength evaluations</Paragraph>
      <Paragraph position="15"> a7If this fraction is very close to 0.5, this indicates an almost perfect split of user opinions on that features. So we classify the feature as 'bimodal' and we report this fact to the user. Otherwise, the feature is classified as 'unimodal', i.e. we need only to communicate one aggregate opinion to the reader.</Paragraph>
    </Section>
    <Section position="2" start_page="308" end_page="308" type="sub_section">
      <SectionTitle>
4.2 Generating Language: Adapting the
Generator of Evaluative Arguments
</SectionTitle>
      <Paragraph position="0"> (GEA) The first task in generating a natural language summary from the information extracted from the corpus is content selection. This task is accomplished in SEA by the feature selection strategy described inSection 4.1.1. After content selection, the automatic generation of a natural language summary involves the following additional tasks (Reiter and Dale, 2000): (i) structuring the content by ordering and grouping the selected content elements as well as by specifying discourse relations (e.g., supporting vs. opposing evidence) between the resulting groups; (ii) microplanning, which involves lexical selection and sentence planning; and (iii) sentence realization, which produces English text from the output of the microplanner. For most of these tasks, we have adapted the Generator of Evaluative Arguments (GEA) (Carenini and Moore, expected 2006), a framework for generating user tailored evaluative arguments. For lack of space we cannot discuss the details here. These are provided on the online version of this paper, which is available at the first author's Web page.</Paragraph>
      <Paragraph position="1"> That version also includes a detailed discussion of related and future work.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML