File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/a92-1007_metho.xml

Size: 26,780 bytes

Last Modified: 2025-10-06 14:12:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="A92-1007">
  <Title>Automatic Generation of Multimodal Weather Reports from Datasets</Title>
  <Section position="4" start_page="48" end_page="50" type="metho">
    <SectionTitle>
6. The formatter assembles the product.
3 Weather report analysis
</SectionTitle>
    <Paragraph position="0"> In this section we describe the initial dataset, the assertions extracted from the dataset, and the form of the final product as well as its specification.</Paragraph>
    <Section position="1" start_page="48" end_page="48" type="sub_section">
      <SectionTitle>
3.1 Structure of the initial dataset
</SectionTitle>
      <Paragraph position="0"> The initial dataset obtained through observations or numerical forecasting techniques is compiled in tabular  -- to the weather elements considered in the report, and subcolumns -- to the time instants to which the data refer. The locations are either the stations where data is collected or the nodes of a regular grid in which the numerical forecast is computed.</Paragraph>
      <Paragraph position="1"> In our experiments we used weather data collected through observations made at the main synoptic hours (00, 06, 12, 18 GMT) in 50 weather stations dispersed over the territory of Bulgaria. Ten weather elements have been considered: cloud amount, precipitation type and amount, wind speed and direction, min and max temperatures, and the phenomena fog, frost, thunderstorm.</Paragraph>
    </Section>
    <Section position="2" start_page="48" end_page="49" type="sub_section">
      <SectionTitle>
3.2 Intermediary representation
</SectionTitle>
      <Paragraph position="0"> An intermediary representation is necessary because the initial dataset describes the weather in terms of a scientifically based model which may not meet the user conceptions. It is intended to accommodate in a language independent form those facts that will be conveyed to the user.</Paragraph>
      <Paragraph position="1"> What are the major differences between the initial data and the intermediary representation? Firstly, they pertain to different territory and time models* While the locations in a dataset are weather stations or grid nodes, in the intermediary representation they are administrative and geographic areas known to the audience. The dataset contains data referring to time instants, whereas the facts of the intermediary representation refer to parts of the day (such as morning) and whole days. Hence, the facts in the intermediary representation summarize the initial data with respect to time and space* The second difference concerns the weather models. In addition to the basic weather elements employed in the  initial dataset, the intermediary representation makes use of some derived attributes. So, the basic numerical quantities wind speed and precipitation amount are converted into qualitative characteristics -- wind strength and precipitation intensity, respectively. Particular examples of other derived attributes are given in section 4. We call the facts from the intermediary representation assertions and denote them as quintuples: ( w_attribut e, w_value, region, period, precision).</Paragraph>
      <Paragraph position="2"> The weather attribute and the weather value represent the goal of the assertion; the region and the time period form its context; the last component denotes the precision of the summarization both over time and space and in the case of facts with derived weather attributes.</Paragraph>
    </Section>
    <Section position="3" start_page="49" end_page="49" type="sub_section">
      <SectionTitle>
3.3 Structure of the final product
</SectionTitle>
      <Paragraph position="0"> The final product is a natural or specialized language text, a table 1 and/or graphics. The basic constructs of those modes are oriented towards the expression of assertions -- the atomic content portions extracted from the dataset. A NL sentence or clause, an icon placed in a certain position on the map, and a lexical or numerical weather value put in a cell of the weather table are all constructs of this type.</Paragraph>
      <Paragraph position="1"> Figure 2 illustrates some modes. For example, the assertion (cloud_amount, overcast, Nor_Bul, morn, high) expressible through the NL sentence &amp;quot;In the morning it will be cloudy over North Bulgaria&amp;quot; is represented as a weather map (Figure 2d) and in the upper left cell of the weather table in Figure 2c. Weather reports can be structured in different ways. The text in Figure 2b is an enumeration type of text with four independent segments labeled by the regions they pertain to, and the text in Figure 2a is a sample of a narrative text.</Paragraph>
    </Section>
    <Section position="4" start_page="49" end_page="50" type="sub_section">
      <SectionTitle>
3.4 Specification of the final product
</SectionTitle>
      <Paragraph position="0"> The user's requirements on the final product are specified by means of a template. It defines the mode, goal and context of the product, as well as various parameters concerning the precision of the information, the length of the message, and the style of text or map. The template consists of two types of statements: statements defining the modal structure of the document and content production statements.</Paragraph>
      <Paragraph position="1"> There are four statements defining the modal structure of the final product: narration, enumeration, table and picture. The general format of a modal structure statement is given below: &lt;modal_struct _st at ement&gt;{&lt;external_context&gt;} &lt;s equence_o~_content_product ion star ement s&gt;.</Paragraph>
      <Paragraph position="2"> The following examples of statements are intended to  generate a product with the modal and content structure of the forecast in Figure 2.</Paragraph>
      <Paragraph position="3"> 1 We should distinguish the standard tabular report representing the initial dataset from the user tables.</Paragraph>
      <Paragraph position="4"> narrat ion{} text{clouds,precip, wind, phen, temp; Bul, whole_day ; precision=O. 6} ;</Paragraph>
      <Paragraph position="6"> The content production statements are value, text an.</Paragraph>
      <Paragraph position="7"> map. The first type of statement produces the lexice presentation of a single weather value (e.g. overcast, c 15degC); the text production statement makes complet sentences linked in a coherent text; and the map produc tion statement generates a cartographical presentation ( the assertions by placing icons of the particular weathe values in certain positions of the map. The format of content production statement is as follows: &lt;content_production_statement&gt; {&lt;goal&gt;; &lt;context&gt;; &lt;parameters&gt;} The goal is the set of weather attributes in which th user is interested. The context contains the region an the time period for which weather information shoul be extracted. The part of the context given as an e) ternal context in the modal structure statement make a heading of the corresponding section, and therefor( this context may not be explicitly mentioned in the rex The parameters specifying the produced content portio are divided into three groups: precision rate, length an style of the message.</Paragraph>
      <Paragraph position="8"> The precision parameter defines the minimum prec sion rate that must be guaranteed by the generated me~ sage. By specifying a high precision value, we rule ot vague sentences like &amp;quot;it will be cloudy in some portior of North Bulgaria&amp;quot; and force the system to retrieve mol precise assertions from the dataset.</Paragraph>
      <Paragraph position="9"> The parameters restricting the length of the messa~ are of three types: * maxasrt - determines the maximum number of sertions generated for each attribute from the goa * length - restricts the length of the final text by spe, ifying the minimum and maximum number of cha acters in it (applies to text production only); * detail - specifies the level of detail of the pr( duced message on a three-element qualitative sca: {concise, normal, full}; concise detail implies th~ only a summary information for each goal should k extracted; full detail makes the system extract con plete information; and normal detail produces a te) with a level of detail in between the two extremes The style parameter defines the message language. \] the case of text production, the language could be a sul  South and East Bulgaria will be mostly sunny. Clouds with showers are expected in North Bulgaria and in the afternoon. In East Bulgaria the wind will increase. High  set of a NL, a telegraphic type of language, or a special-purpose language conformed with specific users needs. In the case of map production, the style determines what types of icons should be used and how the time will be presented (through several maps, by explicitly indicating the time periods on the map, etc.). For reference purposes, each style is given a unique identifier, e.g. english, telegr-bul-report, avionic.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="50" end_page="51" type="metho">
    <SectionTitle>
4 Terminological knowledge
</SectionTitle>
    <Paragraph position="0"> The terminological knowledge-base (TKB) represents the weather, territory and time models.</Paragraph>
    <Paragraph position="1"> The weather model consists of the set of weather attributes, their domains, relations between some domains, and rules for calculation of derived attributes. So the qualitative weather element wind strength with a fiveelement ordered domain is calculated from the numerical basic attribute wind speed by means of the rule:</Paragraph>
    <Paragraph position="3"> The derived weather attribute cloud change with a fourelement nominal scale is calculated by means of a rule based on the properties mouotonicity and amplitude of the basic attribute cloud amount with a three-element ordered scale {clear, partly_cloudy, overcast}. Similar rules allow the system to calculate summary weather attributes. For example, the clouds attribute unifies the domains of the attributes cloud amount and cloud change into the domain {clear, partly_cloudy, overcast, clouds_increase, clouds_decrease, variable}.</Paragraph>
    <Paragraph position="4"> Two weather values Vl and v2 are considered related if they represent co-occurring weather characteristics (e.g.</Paragraph>
    <Paragraph position="5"> overcast and rain) and opposite if the characteristics are associated as contrary (e.g. clear and overcast). The two relations are defined in the TKB by means of the predicates related(v1, v2) and opposite(v1, v2).</Paragraph>
    <Paragraph position="6"> The territory model represents the set of regions, their carriers and certain logical links between them. The function carrier(r) returns the set of stations that belong to r, thereby allowing us to treat the regions as sets. The predicate path(rl,r2,...,r,~) indicates that there is a path starting from region rl, passing through r2, ..., r,~_l, and reaching rn.</Paragraph>
    <Paragraph position="7"> The time model defines the time periods as intervals of time instants through the functions begin(t) and end(t). Two relations between time periods supported by the TKB are partial order (tt &lt; t2 iff end(tl) ~_ begin(t2)) and inclusion (tl C t2 iff \[begin(tl), end(t1)\] C \[begin(t2), end(t 2)\])-The relations between weather values, regions and time periods are employed in the selection of rhetorical schemas (cf. section 6).</Paragraph>
  </Section>
  <Section position="6" start_page="51" end_page="51" type="metho">
    <SectionTitle>
5 Scanning the dataset
</SectionTitle>
    <Paragraph position="0"> The scanner determines the content portions of the message by computing relevant assertions from the dataset.</Paragraph>
    <Paragraph position="1"> The monitor calls it with two types of queries specifying the goal (a single weather attribute), the context and a parameter concerning either the precision rate or the maximum number of assertions to be produced: scanp(clouds, Bul, whole day, 0.8) scana(clouds, Bul, whole_day, 3) The first query makes the scanner extract assertions about the clouds attribute applied to Bulgaria and the whole day, and with a precision rate greater than or equal to 0.8. The second query restricts the maximum number of assertions that should be extracted to three.</Paragraph>
    <Paragraph position="2"> The scanning is carried out in three steps: generation of a full set of assertions, pruning the full set of assertions and selection of the final set of assertions.</Paragraph>
    <Paragraph position="3"> In the first step, the scanner applies weather verification techniques (Kerpedjiev and Ivanov, 1991) to generate an assertion for each context that belongs to the query context. Such an assertion contains the weather value that approximates the data subset corresponding to that context with the highest precision rate.</Paragraph>
    <Paragraph position="4"> In order to avoid a combinatorial explosion during the selection, the set of assertions is pruned by removing all assertions that can be inferred from other assertions. (An assertion aa can be inferred from a2 if both assertions convey the same weather value, but aa relates to a subcontext of a2 and its precision rate does not excede that of a~.) The average reduction rate of the pruning is 70%.</Paragraph>
    <Paragraph position="5"> The selection of a combination of assertions is first made independently for each weather value of the goal attribute. A combination of two assertions (w, v, rl, tl,pl) and (w, v, r2, t2, P2) is evaluated by means of the formula min(pl,p2, 1 -p~, 1 -p&amp;quot;) where p' and p&amp;quot; are the precision rates of the assertions (w, v, r - rl - r2, t,p ~) and (w, v, rl-q-r2, t--tl --t2, p&amp;quot;), respectively, r and t being the query context. Then the scanner selects the most precise combinations for the different weather values and returns them as a response to the query.</Paragraph>
  </Section>
  <Section position="7" start_page="51" end_page="53" type="metho">
    <SectionTitle>
6 Planning the report
</SectionTitle>
    <Paragraph position="0"> The planner assimilates a set of assertions into a surface text structure or a map plan. Since planning is essentially a process of arranging the information in a coherent way, we will consider at first the coherence in weather reports and then will elaborate the planning techniques.</Paragraph>
    <Section position="1" start_page="51" end_page="52" type="sub_section">
      <SectionTitle>
6.1 Coherence
</SectionTitle>
      <Paragraph position="0"> Coherence can be ensured for the portions created only by the text production and map production statements since the modal structure statements combine the constituent parts mechanically without caring for the consistency between them.</Paragraph>
      <Paragraph position="1"> Coherence of a text portion is achieved by selecting a rhetorical schema that suits best the current set of assertions. The main vehicle for ensuring proper organization of the text content is the employment of existing relations in the TKB. Indeed, those links represent common associations and orderings of the objects, and following any of them while reading or hearing the text will enable the user to assimilate the information easily with minimum cognitive effort.</Paragraph>
      <Paragraph position="2"> Based on the analysis of a number of textual weather forecasts and reports, we have extracted and collected seven types of rhetorical schemas: Presentation by weather attributes. An assertion about a given attribute cannot interpose a sequence of assertions concerning another attribute.</Paragraph>
      <Paragraph position="3"> From a summary to details. An assertion with a context which includes the context of another assertion is conveyed before the second assertion.</Paragraph>
      <Paragraph position="4"> Temporal progression. The assertions are ordered by the successive time intervals they pertain to.</Paragraph>
      <Paragraph position="5"> Spatial progression. The assertions are arranged in such a way that their regions form a conceptually existing path.</Paragraph>
      <Paragraph position="6"> Coupling related values. Assertions with related values and intersecting contexts are rendered in a group.</Paragraph>
      <Paragraph position="7"> Contrast. Two assertions with opposite values are conveyed together to contrast with each other.</Paragraph>
      <Paragraph position="8"> Presentation by weather values. The assertions about given attribute with an ordered domain are conveyed in successive groups relating to the particulal weather values.</Paragraph>
      <Paragraph position="9"> The problem of supplementing a graphical portior (created by the map production statement) with aver bal comment may arise when the situation presented oi the map is dynamic, imprecise or uncertain. Due to th( lack of proper graphical means of expression for suct properties, a text has tobe created that specifies the in formation available on the map. The following exampl, illustrates the problem.</Paragraph>
      <Paragraph position="10"> Suppose that the assertion (phen, fog, Nor_Bul, morn high) has to be shown on a map created for the whol, day. A presentation with the pictograph for fog place( in one or more positions dispersed uniformly over th, region specified may prove misleading because the im portant information about the time period is absent. T, resume the correctness of the map the following concis text message should be created: &amp;quot;The fog in North Bulgaria will clear by noon.&amp;quot; It consists of the reference part &amp;quot;the fog in North Bu\] garia&amp;quot; and the specification part &amp;quot;will clear by noon&amp;quot; The reference part identifies the phenomenon throug elements expressed on tile map while the specificatio part conveys the missing or distorted elements.</Paragraph>
    </Section>
    <Section position="2" start_page="52" end_page="52" type="sub_section">
      <SectionTitle>
6.2 Text planning
</SectionTitle>
      <Paragraph position="0"> The conversion of a set of assertions into a surface structure poses two main problems: * How to find the most suitable rhetorical structure of the text? * How to realize this structure into the surface structure of cohesive sentences? We employed rhetorical and grammatical knowledge embedded in rules to cope with those problems. For each rhetorical schema, a rule is formulated whose condition part evaluates how well the set of assertions is stratified by the corresponding schema. For example, we regard a set of assertions as well stratified by a path of regions if all assertions pertain to the same attribute and time period and there exists a one-to-one correspondence between the regions of the path and the regions of the assertions, or a set of assertions is well stratified chronologically if all assertions pertain to the same region and there is no overlap between their time periods.</Paragraph>
      <Paragraph position="1"> Since the conditions of the &amp;quot;temporal progression&amp;quot; and &amp;quot;spatial progression&amp;quot; rhetorical schemas as described above are too rigid and so they are rarely satisfied by the assertions produced by the scanner, we loosened them by allowing partial instead of full coincidence between the regions. The grade of similarity between two regions rl and r~ is defined by the formula: learrier(r~ ) n carrier( r~)l d(rl, = 1 - u and they are considered coincident if d(rl, r2) &gt; 0.7.</Paragraph>
      <Paragraph position="2"> Thus the set of three assertions concerning the regions &amp;quot;the lowlands of West Bulgaria&amp;quot;, &amp;quot;Central Bulgaria&amp;quot; and &amp;quot;the Black sea coast&amp;quot; can be successfully mapped out along the path &amp;quot;West Bulgaria&amp;quot;, &amp;quot;Central Bulgaria&amp;quot;, &amp;quot;East Bulgaria&amp;quot;.</Paragraph>
      <Paragraph position="3"> There are certain priorities among the rhetorical schemas. The schemas &amp;quot;presentation by attributes&amp;quot; and &amp;quot;coupling related values&amp;quot; have priority over the others; the schema &amp;quot;from a summary to details&amp;quot; has priority over the temporal and spatial progressions, etc. A rule with a higher priority than another is applied first, and only if it fails, then the second rule is tried.</Paragraph>
      <Paragraph position="4"> The action part of the chosen rule breaks the set of assertions into a chain of chunks. The link between two chunks represents the conversational move that takes place when the discourse passes from the source to the target chunk. Then each chunk is broken down into a subchain, and so on until a hierarchical discourse structure is obtained, at the terminal nodes of which are the assertions of the initial set (cf. Figure 3b).</Paragraph>
      <Paragraph position="5"> The conversion into a surface structure proceeds by applying the rules embedding the grammatical knowledge. They analyze the discourse structure by means of patterns. The matching of a pattern with a discourse substructure leads to a transformation of the latter into the surface structure of a sentence, clause or phrase and its bounding to the surface structure of the text. In addition to indicators of the elements of the discourse structure, the patterns may contain conditions on the contents of the assertions and on the types of the preceding sentences. Figure 3c shows a portion of the surface structure realizing the discourse structure in Figure 3b.</Paragraph>
      <Paragraph position="6"> The following features characterize the creation of the surface structure of a text: * A good deal of sentences are constructed on the basis of impersonal verb phrases typical for weather description.</Paragraph>
      <Paragraph position="7"> * The tense of the verbs is determined by the type of the report. If it is a forecast, then future tense is adopted, otherwise -- past tense.</Paragraph>
      <Paragraph position="8"> * Where appropriate, function words are inserted that indicate the type of conversational move (e.g. &amp;quot;but&amp;quot; for contrast, &amp;quot;also&amp;quot; for addition, etc.).</Paragraph>
      <Paragraph position="9"> * Certain elements of the context (the region and/or the time period) are omitted, if implied from the preceding text or the external context, or are replaced by adverbial or relative adverbial phrases (&amp;quot;there, then, where, when&amp;quot;), if the corresponding element is implied but the grammatical structure requires such a phrase.</Paragraph>
      <Paragraph position="10"> * The precision rate of the assertions, if lower than high, is indicated by inserting proper modifiers, such as &amp;quot;at many places of ...&amp;quot;, &amp;quot;mostly&amp;quot; etc., which warn the reader to accept the information with some reservations.</Paragraph>
      <Paragraph position="11"> * The word order of the sentences is selected in such a way that the elements constituting the topics and the focuses of the consecutive sentences alternate (Haji~ovPS, 1987). For example, if the region is the focus of one sentence, it is good to generate the next sentence with the region being its topic. Thus the text will flow rhythmically and at a proper pace.</Paragraph>
    </Section>
    <Section position="3" start_page="52" end_page="53" type="sub_section">
      <SectionTitle>
6.3 Supplementing a map with a text
</SectionTitle>
      <Paragraph position="0"> A technique of converting a set of assertions into a weather map has been described in (Kerpedjiev, 1990).</Paragraph>
      <Paragraph position="1"> Here we concisely recall the technique and extend it to allow the generation of text supplements.</Paragraph>
      <Paragraph position="2"> The conversion of a set of assertion into a map is based on the existence of a set of visual objects (pictographs) and two functions -- f and g; f assigns a pictigraph to each weather value; and g, for each region, determines the positions where the icons related to a given attribute should be put in. The algorithm of conversion scans the selected set of assertions and generates tile map plan by replacing each assertion (w, v, r, t,p) with a list of statements {(q, xi, yi)}i=l..,~, where q = f(t').</Paragraph>
      <Paragraph position="3"> {(xi, yi)}i=l n = g(w,r). A statement (q,x,y) of the map plan drives the formatter to place icon q in the position with coordinates (x, y).</Paragraph>
      <Paragraph position="4"> Some problems arise with this technique. Firstly. two pictographs may occur to overlap and distort each other.</Paragraph>
      <Paragraph position="6"> Secondly, certain geometrical relations between the icons of related assertions should be ensured. Thirdly, the information concerning the time period and the precision rate is completely ignored by the conversion technique.</Paragraph>
      <Paragraph position="7"> The first two problems are resolved by carefully designing the function g. Information about a time period and/or precision rate, when necessary, is provided by a verbal comment as described below.</Paragraph>
      <Paragraph position="8"> Suppose that the assertion (w, v, r,t,p) is visualized on a map representing the weather situation in context (r',ff). The corresponding map plan will represent the assertion correctly only if t' C t and p = high. If any of these relations is violated then the system has to impart this information to the user. We call it residual information or a residue. It consists of a reference part determined by the weather value (the user associates it with the corresponding icon) and possibly by the region (if r f'l r ' # /; the user should identify it with the locations where the icons are situated) and a specificalion part determined by the elements t' f'lt (if different from t ') and p (if not high). The grammar for rendering a residue as a sentence is available in the grammatical knowledge base. Furthurmore, the residual information can be formulated either as a characteristics of the reference part (e.g. &amp;quot;the rain in East Bulgaria will be scattered&amp;quot;) or as a process (e.g. &amp;quot;the rain in East Bulgaria will stop by noon&amp;quot; ).</Paragraph>
      <Paragraph position="9"> The planning of a text supplement may face the following problem. Consider the example in section 6.1. If another assertion occurs about a fog in North Bulgaria at noon, then the residual information must be adjustec to the following message: &amp;quot;The fog in North Bulgaria will clear by the afternoon&amp;quot; In order to avoid any inconsistency in the generated mes.</Paragraph>
      <Paragraph position="10"> sage, the system collects the residues relating to the sam~ weather attribute and region, unifies their specificatio~ parts whereby some content portions may partially neu tralize each other, and then generates the message.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML