File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-1002_metho.xml
Size: 20,916 bytes
Last Modified: 2025-10-06 14:09:10
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1002"> <Title>Extending Document Summarization to Information Graphics</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 The Role of Intention in Graphics Summarization </SectionTitle> <Paragraph position="0"> Text summarization has generally relied on statistical techniques and identification and extraction of key sentences from documents. However, it is widely acknowledged that to truly understand a text and produce the best summary, one must understand the document and recognize the intentions of the author. Recent work in text summarization has begun to address this issue. For example, (Marcu, 2000) presents algorithms for automatically identifying the rhetorical structure of a text and argues that the hypothesized rhetorical structure can be successfully used in text summarization.</Paragraph> <Paragraph position="1"> Information graphics are an important component of many documents. In some cases, information graphics are stand-alone and constitute the entire document. This is the case for many graphics appearing in newspapers, such as the graphic shown in Figure 1. On the other hand, when an article is comprised of text and graphics, the graphic generally expands on the text and contributes to the discourse purpose (Grosz and Sidner, 1986) of the article. For example, Figure 2 illustrates a graphic from Newsweek showing that the income of black women has risen dramatically over the last decade and has reached the level of white women. Although this information is not conveyed elsewhere in the article, it contributes to the overall communicative intention of this portion of the article -- namely, that there has been a &quot;monumental shifting of the sands&quot; with regard to the achievements of black women.</Paragraph> <Paragraph position="2"> Our project is concerned with the understanding and summarization of information graphics: bar charts, line graphs, pie charts, etc. We contend that analyzing the data points underlying an information graphic is insufficient. One must instead identify the message that the graphic designer intended to convey via the design choices that were made in constructing the graphic. (Although one might suggest relying on captions to provide the intended message of a graphic, Corio and Lapalme found in a large corpus study (Corio and Lapalme, 1999) that captions are often missing or are very general and uninformative; our collected corpus of information graphics supports their observations.) Design choices include selection of chart type (bar chart, pie chart, line graph, etc.), organization of information in the chart (for example, aggregation of bars in a bar chart), and attention-getting devices that highlight certain aspects of a chart (such as coloring one bar of a bar chart different from the others). Not only should the graphic designer's intended message comprise the primary component of any summary, but this intended message has a strong influence on the salience of additional propositions that might be included in the summary.</Paragraph> <Paragraph position="3"> To see the importance of recognizing the graphic designer's intended message, consider the two graphics in Figure 3. The one on the left, Figure 3a, appeared in an NSF publication. Both graphics were constructed from the same data set. The intended message of the graphic in Figure 3a is that the salary of females is consistently less than that of males for each of the science and engineering disciplines.1 Notice that the graphic designer selected an organization for the graphic in Figure 3a that facilitated the comparison between male and female salaries in each field. A different display of the same data would facilitate different analyses. For example, the graph in Figure 3b depicts the same data as the graph in Figure 3a, yet the organization tends to draw attention to comparisons within male and female groups rather than between them, 1This graphic was constructed by a colleague who served on the NSF panel that prepared the report. Thus we know the intentions underlying the graphic.</Paragraph> <Paragraph position="4"> and perhaps an integration/comparison of the messages conveyed by the two subgraphs. Thus the intended message of the graphic in Figure 3b appears to be that the ranking of the disciplines by salary are about the same for both men and women. The distinctions between presentation formats illustrate the extent to which the format can itself convey information relevant to the graphic designer's intended message.</Paragraph> <Paragraph position="5"> Now let us consider how the intended message influences additional information that might be included in a summary. Suppose that 1) the salary differential between females and males was significantly larger in the life sciences than in other disciplines and 2) the average salary for both females and males was much larger in engineering than in any of the other disciplines. Feature 1) would be particularly interesting and relevant to the intended message of Figure 3a, and thus should be included as part of the graphic's summary. On the other hand, this aspect would be less relevant to the intended message of Figure 3b and thus not as important to include. Similarly, Feature 2) would be particularly relevant to the intended message of Figure 3b and thus should be given high priority for inclusion in its summary. Although an interactive system that could analyze a graphic to any desired level of detail might extract from the graphic the information in both 1) and 2) above, we contend that a summary of the graphic should prioritize content according to its relevance to the designer's intended message.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Graphic Summarization </SectionTitle> <Paragraph position="0"> Our architecture for graphic summarization consists of modules for identifying the components of the graphic, hypothesizing the graphic designer's intended message, planning the content of the summary, organizing a coherent summary, and interactive followup. The following sections discuss four of these modules.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Analyzing and Classifying a Graphic </SectionTitle> <Paragraph position="0"> The visual extraction module takes a screen image of an information graphic. It is responsible for recognizing the individual components comprising the graphic, identifying the relationship of the different components to one another and to the graphic as a whole, and classifying the graphic as to type. This includes using heuristics (such as relative position of a string of characters) to identify the axis labels -- for example, that the y-axis label is Delaware bankruptcy personal filings in Figure 1. Our current implementation deals only with gray scale images (in pgm format) of bar charts, pie charts, and line graphs, though eventually it will be extended to handle color and other kinds of information graphics. The output of the visual extraction component is an XML file that describes the chart and all of its components.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Identifying the Intended Message </SectionTitle> <Paragraph position="0"> The second module of our architecture is responsible for inferring the graphic designer's intended message. In their work on multimedia generation, the AutoBrief group proposed that speech act theory can be extended to the generation of graphical presentations (Kerpedjiev and Roth, 2000; Green et al., 2004). They contended that the graphic design was intended to convey its message by facilitating requisite perceptual and cognitive tasks. By perceptual tasks we mean tasks that can be performed by simply viewing the graphic, such as finding the top of a bar in a bar chart; by cognitive tasks we mean tasks that are done via mental computations, such as computing the difference between two numbers.</Paragraph> <Paragraph position="1"> The goal of our intention recognizer is the inverse of the design process: namely, to use the displayed graphic as evidence to hypothesize the communicative intentions of its author. This is done by analyzing the graphic to identify evidence about the designer's intended message and then using plan recognition (Carberry, 1990) to hypothesize the author's communicative intent.</Paragraph> <Paragraph position="2"> Following AutoBrief (Kerpedjiev and Roth, 2000), we hypothesize that the graphic designer chooses a design that makes important tasks (the ones that the viewer is intended to perform in recognizing the graphic's message) as salient or as easy as possible. Thus salience and ease of performance should be taken into account in reasoning about the graphic designer's intentions.</Paragraph> <Paragraph position="3"> There are several ways that a task can be made salient. The graphic designer can draw attention to a component of a graphic (make it salient) by an attention-getting or highlighting device, such as by coloring a bar in a bar chart differently from the other bars as in Figure 1 or by exploding a wedge in a pie chart (Mittal, 1997). Attributes of the highlighted graphic component are treated as focused entities. Nouns in captions also serve to establish focused entities. For example, a caption such as &quot;Studying not top priority&quot; would establish the noun studying as a focused entity. Focused entities that appear as instantiations of parameters in perceptual or cognitive tasks serve as evidence that those tasks might be particularly salient. Similarly, verbs that appear in captions serve as evidence for the salience of particular tasks. For example, the verb beats in a caption such as &quot;Canada Beats Europe&quot; serves as evidence for the salience of a Recognize relative difference task. In the future, we plan to capture the influence of surrounding text by identifying the important concepts from the text using lexical chains. Lexical chains have been used in text summarization (Barzilay et al., 1999), and our linear time algorithm (Silber and McCoy, 2002) makes their computation feasible even for large texts. Whether a task is salient and the method by which it was made salient are used as evidence in our plan inference system.</Paragraph> <Paragraph position="4"> The graphic design makes some tasks easier than others. We use a set of rules, based on research by cognitive psychologists, to estimate the relative effort of performing different perceptual and cognitive tasks. These rules, described in (Elzer et al., 2004), have been validated by eye-tracking experiments. Since the viewer is intended to recognize the message that the graphic designer wants to convey, we contend that the designer will choose a graphic design that makes the requisite tasks easy to perform. This was illustrated in the two graphics in Our plan inference framework takes the form of a Bayesian belief network. Bayesian belief networks have been applied to a variety of problems, including reasoning about utterances (Charniak and Goldman, 1993) and observed actions (Albrecht et al., 1997). The belief network uses plan operators, along with evidence that is gleaned from the information graphic itself (as discussed in the preceding section), to reason about the likelihood that various hypothesized candidate plans represent the intentions of the graphic designer.</Paragraph> <Paragraph position="5"> Plan Operators for Information Graphics Our system uses plan operators that capture knowledge about how the graphic designer's goal of conveying a message can be achieved via the viewer performing certain perceptual and cognitive tasks, as well as knowledge about how information-access tasks, such as finding the value of an entity in a graphic, can be decomposed into simpler subgoals. Our plan operators consist of: Goal: the goal that the operator achieves Data-requirements: requirements that the data must satisfy in order for the operator to be applicable in a graphic planning paradigm Display-constraints: features that constrain how the graphic is eventually constructed if this operator is part of the final plan Body: lower-level subgoals that must be accomplished in order to achieve the overall goal of the operator.</Paragraph> <Paragraph position="6"> Figures 4 and 5 present two plan operators for the goal of finding the value <v> of an attribute <att> for a graphical element <e> (for example, the value associated with the top of a bar in a bar chart). The body of the operator in Figure 4 specifies that the goal can be achieved by a primitive perceptual task in which the viewer just perceives the value; this could be done, for example, if the element in the graphic is annotated with its value. On the other hand, the body of the operator in Figure 5 captures a different way of finding the value, one that presumably requires more effort. It specifies the perceptual task of finding the values <l1> and <l2> surrounding the desired value on the axis along with the fraction <f> of the distance that the desired value lies between <l1> and <l2>, followed by the cognitive task of interpolating between the retrieved values <l1> and <l2>.</Paragraph> <Paragraph position="7"> Plan inference uses the plan operators to reasons backwards from the XML representation of the observed graphic (constructed by the visual extraction module briefly described in Section 3.1). The display constraints are used to eliminate operators from consideration -- if the graphic does not capture the operator's constraints on the display, then the operator could not have been part of a plan that produced the graphic. The data requirements are used to instantiate parameters in the operator -- the data must have had certain characteristics for the operator to have been included in the graphic designer's plan, and these often limit how the operator's arguments can be instantiated.</Paragraph> <Paragraph position="8"> The Bayesian Belief Network The plan operators are used to dynamically construct a Bayesian network for each new information graphic. The network includes the possible top level communicative intentions (with uninstantiated parameters), such as the intention to convey a trend, and the alternative ways of achieving them via different plan operators. The perceptual tasks of lowest effort and the tasks that are hypothesized as potentially salient are added to the network. Other tasks are entered into the network as they are inferred during chaining on the plan operators; unification serves to instantiate parameters in higher-level nodes. Evidence nodes are added for each of the tasks entered into the network, and they provide evidence (such as the degree of perceptual effort required for a task or whether a parameter of the task is a focused entity in the graphic as discussed in Section 3.2.1) for or against the instantiated tasks to which they are linked. After propagation of evidence, the top-level intention with the highest probability is hypothesized as the graphic designer's primary intention for the graphic.</Paragraph> <Paragraph position="9"> Of course, a Bayesian network requires a set of conditional probabilities, such as 1) the probability that perceptual Task-A will be of low, medium, or high effort given that the graphic designer's plan includes the viewer performing Task-A, 2) the probability that parameter <x> of Task-A will be a fo-Goal: Find-value(<viewer>, <g>, <e>, <ds>, <att>, <v>) Gloss: Given graphical element <e> in graphic <g>, <viewer> can find the value <v> in dataset <ds> of attribute <att> for <e> Data-req: Dependent-variable(<att>, <ds>) Goal: Find-value(<viewer>, <g>, <e>, <ds>, <att>, <v>) Gloss: Given graphical element <e> in graphic <g>, <viewer> can find the value <v> in dataset <ds> of attribute <att> for <e> cused entity in the caption given that the graphic designer's plan includes the viewer performing Task-A, or 3) the probability that the viewer performing Task-B will be part of the designer's intended plan given that Task-A is part of his plan. (Note that there may be several alternative ways of performing a particular task, as illustrated by the two plan operators displayed in Figures 4 and 5.) We have collected a rapidly expanding corpus of information graphics, and have analyzed a small part of this corpus to construct an initial set of probabilities. The results suggest that our approach is very promising.</Paragraph> <Paragraph position="10"> We will increase the number of analyzed graphics to improve the probability estimates.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Planning the Content of the Summary </SectionTitle> <Paragraph position="0"> The recognized intention of the graphic designer, such as to convey an overall increasing trend or to compare salaries of females and males in different disciplines as in Figure 3a, will provide one set of highly salient propositions that should be included in the graphic's summary. Once the intentions have been recognized, other visual features of the graphic will influence the identification of additional salient propositions.</Paragraph> <Paragraph position="1"> We conducted a set of experiments in which subjects were asked to write a brief summary of a set of line graphs, each of which arguably could be said to have the same high-level intention. Although each summary included the high-level intention, the summaries often differed significantly for different graphs. By comparing these with summaries of the same graph by different subjects, we have hypothesized that certain features, such as the variance of the data, can influence the generated summary, and that the importance of including a specific feature in a summary is related to the high-level intention of the graphic. For example, variation in the data will be relevant for an intention of conveying a trend, but it will be less important than the overall slope of the data points. This impact of the intended message on the priority of including a specific feature in a graphic was illustrated in Section 2, where we showed how a significantly larger differential between female and male salaries for one particular discipline would be more relevant to the summary of the graphic in Figure 3a than for the graphic in Figure 3b. In addition, our experiments indicate that the strength of a feature in the graphic also influences its inclusion in a summary. For example, the more ragged a sequence of line segments, the more salient variance becomes for inclusion in a summary.</Paragraph> <Paragraph position="2"> Once the content planning module has identified and ranked interesting features that might augment the intended message of the graphic, the most important propositions will be organized into a coherent summary that can be stored for access in a digital library or presented to a user. In the future, we will also investigate integrating the summary of an information graphic with the summary of its surrounding text.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.4 Interactive Followup </SectionTitle> <Paragraph position="0"> One of the primary goals of our work is an interactive natural language system that can convey the content of an information graphic to a user with sight impairments. For this application, the summary will be rendered in natural language and conveyed as an initial summary to the user via speech synthesis. The system will then provide the user with the opportunity to seek additional information.</Paragraph> <Paragraph position="1"> We will utilize the propositions that were not included in the initial message as indicative of additional information about the graphic that might be useful. Several kinds of followup will be provided. For example, if the user requests focused followup, the system will categorize the remaining propositions (for example, extreme values, trend detail, etc.) and ask the user to select one of the categories of further information. The system will then construct a followup message summarizing the most important (often all) of the remaining propositions in the selected category. This interactive followup will continue until either all the propositions have been conveyed or the user terminates the followup cycle.</Paragraph> </Section> </Section> class="xml-element"></Paper>