File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/w96-0406_metho.xml

Size: 15,607 bytes

Last Modified: 2025-10-06 14:14:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0406">
  <Title>PostGraphe: a system for the generation of statistical graphics and text</Title>
  <Section position="4" start_page="1985" end_page="1985" type="metho">
    <SectionTitle>
3 A report generator: the Post-
</SectionTitle>
    <Paragraph position="0"> Graphe system Our prototype, the PostGraphe system is a compromise between keeping the implementation simple and obtaining satisfactory results. After examining a number of reports, we noticed that text and graphics were often used together to transmit the same message. Since one of our goals was the study of the integration of text and graphics, we decided to always generate a text/graphics pair for every message.</Paragraph>
    <Paragraph position="1"> Unfortunately, we could not simplify the realization level. We would have prefered to use a readily available graphical tool for realization and spend more time on higher-level aspects such as the medium selection. A few attempts were made using tools such as X-Lisp-Stat for point and line graphs and LATEX for tables. Unfortunately, too many high-level choices depend on simple low-level details such as the number of available colors or the positioning of textual labels in a graph. By designing our own graphical realizer in Prolog, the same language as the rest of the system, we were able to precisely integrate it in the decision process, thus allowing more accurate heuristics and a backtracking approach for more complex cases.</Paragraph>
    <Paragraph position="2"> .ks for the text realization tool, we chose to adapt a systemic-based text generator called Pr~Texte \[11\]. This system was well-suited to our needs for two reasons: first, it was developed in Prolog, making it easy to integrate into Post-Graphe. Second, it specializes in the generation of temporal expressions. Since evolution is one of profits the most frequent goals in a statistical report, the temporal knowledge built into Pr~Texte proved very useful.</Paragraph>
    <Paragraph position="3"> We will now describe the major steps followed by the system in the generation of a report.</Paragraph>
    <Paragraph position="4"> The input of PostGraphe, consists of 3 special annotations followed by the raw data. These annotations indicate the types of the variables, how to determine the relational keys for the data and a series of predicates describing the writer's intentions. The justification for these annotations and their Prolog syntax are presented in detail in \[10\]. See figure 6 for an example of their use.</Paragraph>
    <Section position="1" start_page="1985" end_page="1985" type="sub_section">
      <SectionTitle>
3.1 Types
</SectionTitle>
      <Paragraph position="0"> The type system's role is to associate to every variable of the input a set of properties and a unit.</Paragraph>
      <Paragraph position="1"> The properties are organised as a multiple inheritance graph divided into a number of sub-graphs, each corresponding to a specific feature \[10\]. The most important sub-graphs describe the following features: organization (nominal, ordinal, quantitative,...) \[2, 12, 13\], domain (enumeration, range,... ), temporal (month, year,... ), format (integer, real, ... ), mesurements (distance, duration, ...), and specific objects (countries, ...).</Paragraph>
      <Paragraph position="2"> The properties have a variable number of parameters which can be used to further specify their function. For example, for an enumerated type (domain sub-graph), a single parameter specifies  the list of values for the enumeration.</Paragraph>
      <Paragraph position="3"> In the input, the main type (or class) of each variable is specified; as well as a list of auxiliary types. The auxiliary properties override the ones that are inherited from the class, thus allowing the tailoring of built-in types in the input. Also, a number of automatic type definitions are added according to the nature of the data (integers, labels, ...).</Paragraph>
      <Paragraph position="4"> Units are organized in a parallel inheritance graph. The inheritance mecanism is much simpler than the one used for types. A unit can be associated with every type (e.g. percentage %). If a unit cannot be found using single inheritance, the name of the type is used as a unit. This process is described in more detail in \[10\].</Paragraph>
    </Section>
    <Section position="2" start_page="1985" end_page="1985" type="sub_section">
      <SectionTitle>
3.2 Relational keys
</SectionTitle>
      <Paragraph position="0"> Relational keys are similar to the notion of the same name in relational databases \[7\] and help determine which variables depend on which others. They are also used for ordering variables in some graphics so that the more important ones (usually the keys) are given the more visible positions. null One of the design goals of 9ostGraphe was to be able to function as a front-end to a spreadsheet. It was thus important to keep the data as close as possible to a format compatible with that type of software. Although a representation at the level of an entity relationship diagram would have been quite useful, especially for long reports and global relationships between sets of data, we chose to limit the input to a table-like structure which is easily obtainable from a spreadsheet.</Paragraph>
      <Paragraph position="1"> Consequently, PostGraphe must be able to automatically compute the relational keys it needs from the data.</Paragraph>
      <Paragraph position="2"> Sometimes, automatic calculation of keys can give strange results which do not fit with the semantics of the variables. For example, a variable such as profits can wind up as a key if its values are all different but it is rarely desirable to express a set of variables such as years and company names as a function of profits. It is usually the other way around.</Paragraph>
      <Paragraph position="3"> To solve this problem, 2 optional informations are specified in the input: a list of variables that can be used as keys and a list of variables that cannot be used as keys. 1his method is easy to implement in a spreadsheet, and some control is maintained without having to abandon automatic calculation of keys (useful for large partially unknown data sets).</Paragraph>
    </Section>
    <Section position="3" start_page="1985" end_page="1985" type="sub_section">
      <SectionTitle>
3.3 Writer's intentions and planning
</SectionTitle>
      <Paragraph position="0"> The writers' intentions describe what to say and up to a certain point, how to say it. Intentions are constraints on the expressivity of the chosen text and graphics. Postfiraphe tries to find the smallest set of schemas that covers the writer's intentions.</Paragraph>
      <Paragraph position="1"> The following basic intentions are covered in our model: the presentation of a variable, the comparison of variables or sets of variables, the evolution of a variable along another one, the correlation of variables and the distribution of a variable over another one. Some of these intentions are further divided into more specific subtypes.</Paragraph>
      <Paragraph position="2"> The study of intentions is a major topic of our research. More details about the organization of our goal system can be found in \[10\].</Paragraph>
      <Paragraph position="3"> PostGraphe uses the same planning mechanism to generate text and graphics. The planner uses the types and values of the data as well as the relational keys but it is mainly goal-driven. It builds on the ideas of Mackinlay \[12, 13\] but extends them in important ways.</Paragraph>
      <Paragraph position="4"> MacKinlay's algorithm, as used in APT, takes as input a set of typed variables and determines the most efficient graphical encoding (position, length, color, ...) for each of them. There are many ways of expressing each variable and the system tries to find a way of expressing them all graphically in the same figure, if possible, or in a set of related figures. APT works by allocating the best possible graphical encoding to each variable and then checking if the result is feasible. If it is not, it backtracks on the last allocation and tries the next best encoding for it. The feasability of a set of choices depends on the output medium (2D vs 3D, color vs greyscale). Since the variables are  allocated sequentially, their ordering is important and determines which variables will get the best encodings in problem situations. The algorithm doesn't try to maximize the overall efficiency of a result but assumes that important variables are listed first and gives them the best encodings.</Paragraph>
      <Paragraph position="5"> This method has a few shortcomings: it is based on a very limited set of types (quantitative, ordinal, nominal), it works on individual variables instead of global relations and it is not easily applicable to text. Working with individual variables is an interesting approach to the problem of graphics generation as it allows the system to reason on the low level components of graphics and it makes it more efficient. On the other hand.</Paragraph>
      <Paragraph position="6"> it creates 2 major problems: it is ambiguous at the realization phase and it ignores inter-variable phenomena. The ambiguity stems from the fact that a number of structurally different graphs can express the same variables using the same encodings. For example, line, bar, column and point graphs can all be used to present 2 variables using positional encoding. However, there are important differences between these 4 graphs. The lines in a line graph, the rectangles and their orientation in bar and column graphs all play an important role in the perception of the data. These differences play a major role in the expression of inter-variable phenomena such as comparison and correlation. For example, correlation is better preceived on a point graph than on a line graph.</Paragraph>
      <Paragraph position="7"> PostGraphe does not use a list of variables as its main input. Instead, it uses a set of inter-variable or intra-variable goals. The result of our planning algorithm is a schema for each group of compatible goals. These schemas are used for text as well as graphics. No ordering of goals or variables is assumed because all choices axe weighted and a global quality function allow the system to maximize the overall efficiency of each graph. By default, the system assumes that all user goals are equivalent but the user can choose to change their relative weights in the input to assure that some of them are better expressed by the system. This maximization complicates the exploration of the solutions as it becomes impossible to return the first feasible solution. Theoretically, one should look at all possible groups of goals to see if they can coexist in the same graph and evaluate how efficient each group is both globally and in regards to constraints placed on individual goals by the user. This is obviously impossible as it leads to massively exponential behaviour. Heuristics are used by PostGraphe to trim the number of solutions down to a usable level.</Paragraph>
      <Paragraph position="8"> The user has the option of manually limiting * the scope of the grouping process by building sets of related goals. The system will respect these boundaries and never try to group goals from different sets. The normal algorithm is applied to goals inside each set. If only a single set if goals is specified, the system does all the work of grouping and ordering the information. This manual partitioning of goals is useful to organize goals according to themes (e.g. a set of goals to present the data, a set of goals to illustrate a trend, ... ). Inside a set of goals, the planning process is divided in 4 steps: we first find the intentions that are &amp;quot;compatible&amp;quot; so that each schema takes into account as many intentions as possible while keeping each one &amp;quot;readable&amp;quot;. The compatibility of intentions is determined using simple heuristics. null Then we check if each group is feasible and determine the best schema to express it. This step is based on a lookup table, much like MacKinlay's algorithm \[12, 13\] which uses an association between the type of a variable and the most efficient graphical methods to express it. Our table is goal-oriented instead of type-oriented: it associates each possible user goal with the schemas that can express it. The table entries are weighted, and the result of this phase is a list of candidates sorted from the most to the least efficient for the current goals.</Paragraph>
      <Paragraph position="9"> The next step is the low-level generation of graphic primitives and text. It can be determined at this stage that a figure cannot be generated because of physical reasons: it is too big to fit, not enough grey levels are available, ... This low level work is quite involved because it has to take into account the 2-D constraints and the limitations of the media. For this we had to develop a Postscript generation system in Prolog in or- null der to determine the exact position of each element (character, line, axis, etc...) of a generated graph. If a candidate is rejected, the next one on the sorted list is tried. The surface text generation is handled by a modified version of Pr4Texte \[11\].</Paragraph>
      <Paragraph position="10"> Finally, a post-optimization phase eliminates redundancies which can occur because the heuristics sometimes miss a compatible grouping of intentions. null An important aspect of PostGraphe is that it uses no high-level reasoning on intentions. Instead, all of its knowledge is encoded in the links and weights of the table, which was first created using a set of graphical rules and conventions.</Paragraph>
      <Paragraph position="11"> This approach is more similar to neural nets than MacKinlay's graphical language. The advantage of such an approach is that the table could be automatically modified by the system in response to user satisfaction or dissatisfaction with a result.</Paragraph>
      <Paragraph position="12"> The obvious problem, as with neural nets, is that the system's knowledge is not easily expressible as a set of human-readable rules.</Paragraph>
    </Section>
    <Section position="4" start_page="1985" end_page="1985" type="sub_section">
      <SectionTitle>
3.4 An automatically generated report
</SectionTitle>
      <Paragraph position="0"> In this section, we present a simple example of input and output from the PostGraphe system. The Prolog input can be seen in figure 6; lines starting with 7, are comments. The output was generated by the system, but the information was manually re-ordered and formatted in order to better satisfy the space requirements of this article. In particular, the graphs are presented at roughly 60% of their actual size and the structure of the report was flattened by removing section titles.</Paragraph>
      <Paragraph position="1"> The captions of the figures were translated from the French output of PostGraphe, but the internal labels and the text produced by Pr4Texte (figure 11) were left in French. The captions show the name of the C/,~bema and the intentions used to generate each figure, with a quality factor (0-100) for each intention.</Paragraph>
      <Paragraph position="2"> data(~ names of the variables \[~nnee,compagnie,profits,depenses\], types of the variables (/ with aux. properties) annee, etiquette, dollar/\[pluriel(profit)\], dollar/\[pluriel(depense)\]\], variables that can be part of a relational key \[a/inee,compa~nie\], variables that can't be part of a relational key</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML