File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/a00-3001_metho.xml
Size: 9,629 bytes
Last Modified: 2025-10-06 14:07:11
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-3001"> <Title>Experimenting with the Interaction between Aggregation and Text Structuring</Title> <Section position="4" start_page="0" end_page="3" type="metho"> <SectionTitle> 3 Capturing the Interactions as Preferences </SectionTitle> <Paragraph position="0"> A key requirement of the GA approach is the ability to evaluate the quality of a possible solution. We claim that it is the relative preferences among factors rather than each individual factor that play the crucial role in deciding the quality. Therefore, if we can capture these preferences in a generation system properly, we would be able to produce coherent text. In this section, we first discuss the preferences among factors related to text planning, based on which those for embedding can be introduced.</Paragraph> <Section position="1" start_page="0" end_page="2" type="sub_section"> <SectionTitle> 3.1 Preferences for global coherence </SectionTitle> <Paragraph position="0"> Following the assumption of RST, a text is globally coherent if a hierarchical structure like an RST tree can be constructed from the text. In addition to the semantic relations and the Joint relation 1 used in (Mellish et al., 1998), we assume a Conjunct or Disjunct relation between two facts with at least two identical components, so that semantic parataxis can be treated as a combining operation on two subtrees connected by the relation.</Paragraph> <Paragraph position="1"> Embedding a Conjunct relation inside another semantic relation is not preferred because this could convey wrong information, for example, in Figure 3, 2 cannot be used to substitute 1. Also a semantic relation is preferred to be used whenever possible. Here is the preferences concerning the use of relations, where &quot;A>B&quot; means that A is preferred over B: 1In (Mellish et al., 1998), a Joint relation is used to connect every two text spans that do not have a normal semantic relation in between.</Paragraph> <Paragraph position="2"> fact (choker, is, broad, fact_no de- 1 ).</Paragraph> <Paragraph position="3"> fact('Queen Alexandra',wore,choker,fact_node-2). fact (choker,'can cover',scar,fact_node-3).</Paragraph> <Paragraph position="4"> fact(band,'might be made of',plaques,fact_node-4). fact(band/might be made of',panels,fact_node-5). fact(scar,is/on her neck',fact_node-6).</Paragraph> <Paragraph position="5"> Heuristic 1 Preferences among features for global coherence: a semantic relation > Conjunct > Joint > parataxis in a semantic relation</Paragraph> </Section> <Section position="2" start_page="2" end_page="2" type="sub_section"> <SectionTitle> 3.2 Preferences for local coherence </SectionTitle> <Paragraph position="0"> In Centering Theory, Rule 2 specifies preferences among center transitions in a locally coherent discourse segment: sequences of continuation are preferred over sequences of retaining, which are then preferred over sequences of shifting. Instead of claiming that this is the best model, we use it simply as an example of a linguistic model being used for evaluating factors for text planning.</Paragraph> <Paragraph position="1"> Another type of center transition that appears frequently in museum descriptions is associate shifting, where the description starts with an object and then moves to a closely associated object or perspectives of that object. Our observation from museum descriptions shows that associate shifting is preferred by human writers to all other types of movements except for center continuation.</Paragraph> <Paragraph position="2"> Oberlander et al. (1999) define yet another type of transition called resuming, where an utterance mentions an entity not in the immediate previous utterance, but in the previous discourse. The following is the preferences among</Paragraph> </Section> <Section position="3" start_page="2" end_page="3" type="sub_section"> <SectionTitle> 3.3 Preferences for embedding </SectionTitle> <Paragraph position="0"> For a randomly produced embedding, we must be able to judge its quality. We distinguish between a good, normal and bad embedding based on the features it bears 2. A good embedding is one satisfying all following conditions: 1. The referring expression is an indefinite, a demonstrative or a bridging description (as defined in (Poesio et al., 1997)).</Paragraph> <Paragraph position="1"> 2. The embedded part can be realised as an adjective or a prepositional phrase (Scott and de Souza, 1990) 3.</Paragraph> <Paragraph position="2"> 3. The embedded part does not lie between text spans connected by semantic parataxis or hypotaxis (Cheng, 1998).</Paragraph> <Paragraph position="3"> 4. There is an available syntactic slot to hold the embedded part.</Paragraph> <Paragraph position="4"> A good embedding is highly preferred and should be performed whenever possible. A normal embedding is one satisfying condition 1, 3 and 4 and the embedded part is a relative clause. A bad embedding consists of all those left.</Paragraph> <Paragraph position="5"> To decide the preferences among embeddings and center transitions, let's look at the paragraphs in Figure 1 again. The only difference between them is the position of the sentence &quot;This necklace was designed by Jessie King&quot;, which can be represented in terms of features of local coherence and embedding as follows: the last three sentences in 1: Joint +</Paragraph> </Section> </Section> <Section position="5" start_page="3" end_page="3" type="metho"> <SectionTitle> 4 Justifying the Evaluation Function </SectionTitle> <Paragraph position="0"> We have illustrated the linguistic theories that can be used to evaluate a text. However, they only give evidence in qualitative terms. For a GA-based planner to work, we have to come up with actual numbers that can be used to evaluate an RS tree.</Paragraph> <Paragraph position="1"> We extended the existing scoring scheme of (Mellish et al., 1998) to account for features for local coherence, embedding and semantic parataxis. This resulted in the rater 1 in Table 14 , which satisfied all the heuristics introduced in Section 3.</Paragraph> <Paragraph position="2"> We manually broke down four human written museum descriptions into individual facts and relations and reconstructed sequences of facts with the same orderings and aggregations as in with in this paper.</Paragraph> <Paragraph position="3"> the original texts. We then used the evaluation function of the GA planner to score the RS trees built from these sequences. In the meantime, we ran the GA algorithm for 5000 iterations on the facts and relations for 10 times. All human texts were scored among the highest and machine generated texts can get scores very close to human ones sometimes (see Table 2 for the actual scores of the four texts). Since the four human texts were written and revised by museum experts, they can be treated as &quot;nearly best texts&quot;. The result shows that the evaluation function based on our heuristics can find good combinations.</Paragraph> <Paragraph position="4"> To justify our claim that it is the preferences among generation factors that decide the coherence of a text, we fed the heuristics into a constraint-based program, which produced a lot of raters satisfying the heuristics. One of them is given in Table 1 as the rater 2. We then generated all possible combinations, including embedding, of seven facts from a human text and used the two raters to score each of them. The two distributions are shown in Figure 4.</Paragraph> <Paragraph position="5"> The qualities of the generated texts are normally distributed according to both raters. The two raters assign different scores to a text as the means of the two distributions are quite different. There is also slight difference in standard deviations, where the deviation of Rater 2 is bigger and therefore it has more distinguishing power. Despite these differences, the behaviours of the two raters are indeed very similar as the two histograms are of roughly the same shape, including the two right halves which tell us how many good texts there are and if they can be distinguished from the rest. The difference in standard deviations is not significant at all. So the distributions of the scores from the two raters show that they behave very similarly in distinguishing the qualities of texts from the same population.</Paragraph> <Paragraph position="6"> As to what extent the two raters agree with each other, we drew the scatterplot of the scores, which showed a strong positive linear correlation between the variables representing the two scores. That is, the higher the score from rater 1 for a given text of the population, the higher the score from rater 2 tends to be.</Paragraph> <Paragraph position="7"> We also calculated the Pearson correlation co-efficient between the two raters and the corre- null scores of the human texts l~ighest scores of the generated texts average scores of the generated texts lation was .9567. So we can claim that for this data, the scores from rater 1 and rater 2 correlate, and we have fairly good chance to believe our hypothesis that the two raters, randomly produced in a sense, agree with each other on evaluating the text and they measure basically the same thing.</Paragraph> <Paragraph position="8"> Since the two raters are derived from the heuristics in Section 3, the above result partially validates our claim that it is the relevant preferences among factors that decide the quality of the generated text.</Paragraph> </Section> class="xml-element"></Paper>