File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/w96-0508_metho.xml
Size: 7,943 bytes
Last Modified: 2025-10-06 14:14:27
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0508"> <Title>On Lexical Aggregation and Ordering</Title> <Section position="2" start_page="0" end_page="29" type="metho"> <SectionTitle> 2. Corpora Studies </SectionTitle> <Paragraph position="0"> Different subsets of an information collection may give rise to many and varied opportunities for aggregation. In fact, human-authored text contains aggregations throughout, as our corpus study shows.</Paragraph> <Paragraph position="1"> \[Dalianis96b\].</Paragraph> <Paragraph position="2"> In the study we manually investigated in total 11 texts. The total amount of words in the first nine texts were 6.452 words and the ratio (syntactic aggregation cases)/(total words) was 1.8%. Including the two last texts, the ratio (syntactic aggregation cases)/(total sentences) was approximately 33%; i.e., one third of the sentences included syntactic aggregation.</Paragraph> <Paragraph position="3"> If each aggregation saves approximately six words, this will make the text 1.8% aggregations x 6 words = 11% shorter, in some cases up to 20% shorter, than it would have been without aggregation. In addition the text becomes easier to read.</Paragraph> <Paragraph position="4"> Aggregated texts sometimes need cue words e.g., each, together, separately, both, to clarify the aggregation (see Example 1, next section). In the study we calculated the ratio cue words/sentences to be 2.0%, and the ratio (cue words)/(syntactic aggregation) to be 15% i.e., every seventh syntactic aggregation contains a cue word.</Paragraph> <Paragraph position="5"> Some types of aggregation, such as Bounded lexical aggregation, refer to bounded sets, and are sometimes signalled by certain cue words, e.g., except, alL.except, exception(s) is~are, besides, excluding, exclusion, most...but, all...not, all...but. An example of Bounded lexical aggregation with a cue word is: Retail sales excluding auto dealers have remained practically unchanged since last June, Statistics Canada said.</Paragraph> <Paragraph position="6"> Example taken from Wall Street Journal 1992, March 24, 60.862 words, which together with Asiatisk Dagbok 1984, 23.860 words contains 84.722 words and 5.807 sentences in both English and Swedish. The texts was scanned automatically for cue words and we found the ratio (Bounded Lexical aggregation cue words) / (total sentences) to be 0.5%, i.e., we have at least 0.5% BL-aggregations, because the ones with no BL-aggregation cue word are not visible or easy to find when scanning a text automatically. null</Paragraph> </Section> <Section position="3" start_page="29" end_page="30" type="metho"> <SectionTitle> 3. The Problem of Ordering </SectionTitle> <Paragraph position="0"> The following problem is described in \[Dalianis&Hovy93\]: Since aggregation rules operate only over adjacent clauses, a reordering of the input clauses is essential for effective aggregation to occur. Certain combinations of input clauses give rise to less redundant text (and hence more readable text, by the basic assumption underlying aggregation) than others. But what are the optimal ordering(s)? And do other criteria apply when measuring optimality? We call issues relating to the ordering of input clauses the clause ordering problem of aggregation.</Paragraph> <Paragraph position="1"> A second ordering problem rears its head.</Paragraph> <Paragraph position="2"> We call this the rule ordering problem of aggregation. Given various kinds of aggregation rules -- lexical (bounded and unbounded), syntactic (various rules), referential, etc. -- does it matter in which order the rules are applied? Depending on how the lexical aggregation rules are written, it might indeed: a. Mariette bought the Christmas tree b. Mariette carried it inside c. Mariette mounted it d. Ann fetched the decorations e. Ann hung the decorations on the tree a. b. c : Syntactic-SP (Subject and Predicate) aggregation~ f f. Mariette bought, carried inside, and mounted the Christmas tree d . e : Syntactic-SP (Subject and Predicate) aggregation~ g g. Ann fetched and hung the decorations on the tree f . g : Syntactic-PDO (Predicate and Direct Object ) aggregation=~ h h. Mariette and Ann bought, carried inside, and mounted, and fetched and hung the decorations on the Christmas tree respectively h : UL-aggregation~ i i. Mariette and Ann put up the Christmas tree or alternative rule ordering: a. b. c : UL-aggregation~ j j. Mariette installed the Christmas tree d. e : UL-aggregation~ k k. Ann decorated the Christmas tree j . k : Syntactic-SP-aggregation~ 1 1. Mariette and Ann installed and decorated the Christmas tree respectively 1 : no more aggregation possible: new BL-aggregation inference required (Note: the cue word respectively is introduced by aggregation to clarify the aggregated text; for more about cue words see \[Dalianis96c\]).</Paragraph> <Paragraph position="3"> In the first case, assuming the existence of a BL-aggregation inference rule that defines put up a Christmas tree as the sequence of events (a) to (e), this rule would produce (i). This rule would however not be able to produce (i) from (1), since (1) contains different actions altogether; here a new rule that decomposed put up a Christmas tree into the actions (j) installed, and (k) decorated would be required. Thus, unless the set of BL-aggregation rules were so crafted as to include all subdecompositions, different orderings of the aggregation rules will produce different results.</Paragraph> <Paragraph position="4"> Furthermore, although lexical aggregation operates over lexis, interactions between syntactic and lexical aggregation necessitate the careful ordering of their respective rules. We performed an experiment to determine the optimal ordering(s) by applying several aggregation rules, in all permutations, to the clauses of a text plan. We implemented three aggregation rules (the Subject-Predicate and Predicate-Direct-Object (Syntactic) aggregation rules and the Bounded Lexical aggregation rule); also to control the order of input clauses, we created three ordering rules. An ordering rule orders the clauses in a text plan according to the weights of the ordering rule. The weights correspond to the predicate, subject, and object of the clause. In order to determine the best order of applying aggregation rules and the ordering rules we performed the following experiment. We had a computer program cycle through all permutations of rules, and generate all possible texts for a given set of input clauses, We then analyzed these texts manually, trying to find a definition of (or failing that at least heuristics for) optimality. Three aggregation rules and three ordering rules give 6! = 720 possible permutations (the 720 possible texts were generated automatically and came to 166 pages of A4 size). Some example permutation outputs are listed in \[Dalianis96b\]. To analyse the results (quite a job!), we had to make qualitative judgements. Our findings are as follows.</Paragraph> <Paragraph position="5"> 1. Somewhat surprisingly, text length (i.e., redundancy of words) is not the best measure of the readability of aggregated texts. Instead, a better measure is internal (structural) coherence, such as is the focus of, for example, Rhetorical Structure Theory \[Mann&Thompson88\].</Paragraph> <Paragraph position="6"> 2. One method to obtain good aggregation results is to perform pairwise application of one ordering and one aggregation rule at a time. A known good ordering rule should be applied on the input clauses and immediately followed by its corresponding aggregation rule, which can then be followed by another pair, etc. For example, the ordering 213 is best associated with the SP aggregation rule; the ordering 132 is best associated with the PDO aggregation rule; and the ordering 132 with the Bounded Lexical aggregation rule.</Paragraph> </Section> class="xml-element"></Paper>