File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/e95-1042_metho.xml
Size: 10,506 bytes
Last Modified: 2025-10-06 14:14:02
<?xml version="1.0" standalone="yes"?> <Paper uid="E95-1042"> <Title>Aggregation in the NL-generator of the Visual and Natural language Specification Tool</Title> <Section position="3" start_page="286" end_page="286" type="metho"> <SectionTitle> 2. Previous research </SectionTitle> <Paragraph position="0"> Several studies on aggregating text based on text structure appear in the literature. In fact, the term aggregation was first used in (Mann & Moore 1980). In (Horacek 1992), is described the integration of aggregation (which he calls grouping) with quantification under guidance of principles of conversational implicature. (Dale 1990) calls it discourse level optimization, (Kempen 1991) calls it forward and backward conjunction reduction.</Paragraph> <Paragraph position="1"> In (Hovy 1990) two structural aggregation rules are used to eliminate redundant information. In an example in (Scott & de Souza 1990), nine heuristic rules aggregate six sentences which express a set of facts using a single sentence. In (Dalianis & Hovy 1993) are eight different aggregation rules described.</Paragraph> </Section> <Section position="4" start_page="286" end_page="286" type="metho"> <SectionTitle> 3. The current NL-generator </SectionTitle> <Paragraph position="0"> To solve the problem of the not &quot;naturalness&quot; of the LOXY-formulas and make them more &quot;natural&quot; the following two modules have been constructed: the natural and compact modules and finally the surface grammar.</Paragraph> <Paragraph position="1"> The LOXY-formula which is to be paraphrased is processed step by step to natural language by the different modules to a deep structure. The natural, and compact modules can be activated and deactivated separately. Finally the surface generator generates natural language text from the deep structure.</Paragraph> <Paragraph position="2"> The surface grammar contains its own generation grammar and uses the same dictionary as the NLparser. The surface generation grammar is a Definite Clause Grammar, DCG, (Pereira & Warren 1980, Clocksin & Mellish 1984), and is not treated in this paper.</Paragraph> </Section> <Section position="5" start_page="286" end_page="286" type="metho"> <SectionTitle> 4. Natural module </SectionTitle> <Paragraph position="0"> The natural module creates a deep structure from the flat LOXY-formula, by looking up its elements in the dictionary. From this information it can decide what the deep structure should look like. The natural module is also called sentence planner, i.e. it plans the length and the internal order of the different sentences.</Paragraph> <Paragraph position="1"> tl is a subscriber and tl is idle and tl has 100 and 100 is a phonenumber and tl has 101 and 101 is a phonenumber and t2 is a subscriber and t2 is idle and t2 has 200 and 200 is a phonenumber.</Paragraph> <Paragraph position="2"> Figure 2a) Normal mode, only surface generation. The natural module does what (Dalianis & Hovy 1993) calls ordering and economy.</Paragraph> <Paragraph position="3"> an idle subscriber tl has a phonenumber 100 and an idle subscriber tl has a phonenumber 101 and an idle subscriber t2 has a phonenumber 200.</Paragraph> <Paragraph position="4"> Figure 2b) Natural mode</Paragraph> </Section> <Section position="6" start_page="286" end_page="287" type="metho"> <SectionTitle> 5. Compact module </SectionTitle> <Paragraph position="0"> The natural language expression, after being processed by the natural module has a lot of redundant noun phrases. This is solved by the compact module. Our aggregation rule says: If two or more identical (and hence redundant) noun phrases are repeated consecutive then remove all the noun phrases except the first one This operation will remove the repetitive generation of the noun phrase and the text becomes concise. (Dalianis & Hovy 1993) calls this subject grouping.</Paragraph> <Paragraph position="1"> an idle subscriber tl has a phonenumber 100 and has a phonenumber 101 and an idle subscriber t2 has a phonenumber 200.</Paragraph> <Paragraph position="2"> Figure 2c) Natural mode + compact mode What we see is that the text can be aggregated in a different way and also that the subject grouping has not been fully applied on the phonenumbers.</Paragraph> </Section> <Section position="7" start_page="287" end_page="288" type="metho"> <SectionTitle> 6. Paraphrase fact bases </SectionTitle> <Paragraph position="0"> Fact bases can be paraphrased into natural language either after that an event is executed with the interpreter or as an answer to a question to the theorem prover. Here we show an example of the latter, (see Figure 3).</Paragraph> <Paragraph position="1"> A question expressed in NL (It is difficult to express questions in VL) is translated to a LOXY expression that the theorem prover tries to prove. The generation module takes the proved query and generates an NL-answer.</Paragraph> <Paragraph position="2"> can ask questions and obtain answers via the theorem prover.</Paragraph> <Paragraph position="3"> 7. Improvements on architecture The present natural language generator of VINST is difficult to control because there are only two control features (natural and compact) available. It is required great effort to adapt the NL-generator to new domains or to extend it without writing new grammar rules. Further on it is difficult to express the NL-paraphrase in a similar fashion as the user expresses him/herself, therefore are some improvements suggested.</Paragraph> <Paragraph position="4"> One suggestion is is to use as a natural language grammar the Core Language Engine (CLE) (Alshawi 1992). CLE is a bidirectional, unification and feature-based grammar written in Prolog.</Paragraph> <Paragraph position="5"> CLE uses Quasi Logical Form (QLF) as linguistic representation for the parsed NL-string. QLF can be used to direct the generator, but it needs to be augmented. We have to construct an Intermediate Generation Form (IGF) which will contain the suitable linguistic primitives. The IGF will be acquired both from the user and from the context where the NL is to be paraphrased, e.g. simulation- or query window. The used words of the user will be reused for generation together with the LOXY formula.</Paragraph> <Paragraph position="6"> When the paraphrasing will be carried out from a VL-expression, then we have to use preset linguistic primitives and words for the NL-generation because there will not be any linguistic primitives.</Paragraph> <Paragraph position="7"> 8. Intermediate Generation Form The Intermediate Generation Form (IGF) will contain the type of sentences, e.g. a fact or an assertion (dcl), a rule (rule), a yes-no-question (ynq), a what, which or who-question (whq), a noun phrase (np) and many more.</Paragraph> <Paragraph position="8"> The Quasi Logical Form (QLF) of CLE uses already dcl, ynq and whq and could be extended to also treat np. The rest of the type of sentences are context dependent, i.e. rule etc. The sentence types above are identical with the ones in the QLF, except of the sentence type np and some others which are VINST specific.</Paragraph> <Paragraph position="9"> To each type of sentence, above, there is a set of features, e.g. adjective form (adj), subjective predicative complement (predcomp), subject grouping (sg) and predicate grouping(pg) and many more.</Paragraph> <Paragraph position="10"> The features can be unordered and the number can be arbitrary. Some of the features are the same as the one QLF uses, except for: predcomp, sg and pg. The IGF contains also two aggregation features; subject and predicate grouping which makes the text nicer to read.</Paragraph> <Paragraph position="11"> Observe that there is no time feature in the IGF, since LOXY has an embedded time.</Paragraph> <Paragraph position="12"> What we also need is a list of words used by the user. The words are obtained from the parser. The IGF needs to be stored together with the LOXY expression until they are going to be used by the NL-generator. The syntax of the IGF is described by showing the Prolog predicate int_genform/3 and its content.</Paragraph> <Paragraph position="13"> int_gen_form(REFNR, TYPE(FEATURELIST), USED_WORD_LIST).</Paragraph> <Paragraph position="14"> REFNR is a reference number to the LOXYexpression to be paraphrased. TYPE is type of sentence and FEATURE_LIST is a list of feature names describing the sentences.</Paragraph> <Paragraph position="15"> USED_WORD_LIST is a list of previous used words.</Paragraph> </Section> <Section position="8" start_page="288" end_page="288" type="metho"> <SectionTitle> 9. Paraphrase fact bases aggregated </SectionTitle> <Paragraph position="0"> Here follows two examples on how the paraphrasing would look like with the new architecture upon paraphrasing a LOXY-fact base to NL, (Not yet implemented) The only thing which changes between the two examples is the content of the IGF.</Paragraph> <Paragraph position="1"> Before generation input propositions are ordered based on the characteristics of their subjects, as</Paragraph> <Paragraph position="3"> p(1, phonenumber(200)))).</Paragraph> <Paragraph position="4"> b) int_gen_form(2,dcl(\[predcomp,sg\]), \[subscriber,idle,be, have,phonenumber~ tl is a subscriber and is idle and has the phonenumber 100 and 101 t2 is a subscriber and is idle and has the phonenumber 200 c) int_gen_form(2,dcl(\[adj,sg,pg), \[subscriber, idle, be, have,phonenumber\]). tl and t2 are idle subscribers and tl has the phonenumbers 100 and 101 and t2 has the phonenumber 200.</Paragraph> <Paragraph position="5"> In the second NL-example, figure 4c), we see how the predicate grouping works.</Paragraph> <Paragraph position="6"> I0. Conclusions and future work We have in this paper shortly described the current NL-generator of the VINST-system. We have found it too inflexible and the generated text too tedious to read, therefore is suggested a new NL-architecture where the user and the context of the user interaction is used to extract an Intermediate Generation Form (IGF). The IGF will contain a new aggregation rule, the so called predicate grouping rule which will make the generated text easier to read, further on is proposed to use a bidirectional grammar for the surface generation.</Paragraph> <Paragraph position="7"> One future suggestion is also to use the results from the NL-parsing for the generation.</Paragraph> </Section> class="xml-element"></Paper>