File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2242_metho.xml
Size: 9,821 bytes
Last Modified: 2025-10-06 14:15:09
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2242"> <Title>Embedding New Information into Referring Expressions</Title> <Section position="2" start_page="0" end_page="1478" type="metho"> <SectionTitle> 2 System Architecture </SectionTitle> <Paragraph position="0"> We design an algorithm to generate referring expressions consisting of both parts. The referring pan is generated by the referring process (Dale, 1992), while the non-referring pan is generated by a sub-type of the aggregation process called embedding, which selects suitable facts and realises them as components within the structure of a referring expression. The algorithm fits into the text planner of ILEX (Oberlander et al., 1998).</Paragraph> <Paragraph position="1"> ILEX is an adaptive hypertext system generating museum object descriptions. In ILEX, pieces of domain knowledge that may be worth expressing in a text are represented as nodes and links in a graph called the Content Potential. Two kinds of nodes useful for referring expression generation are entity nodes and fact nodes 2. A fact is represented as Predicate(Argl,Arg2). A revised version of Text Structure (TS) (Meteer, 1992) is used as an intermediate level of representation between the text planner and the sentence realiser, which provides syntactic constraints to the text planner while abstracting away from linguistic details. The Text Structure uses a unified representation for structures both above and below sentence level, so that abstract sentence planning can be done in text planning.</Paragraph> <Paragraph position="2"> The text generation process follows roughly four steps: 1) The text planner selects a set of facts to be expressed and the best rhetorical relations between them 3. 2) The text planner builds the TS for each fact in the set. For each entity in a chosen fact, the referring process produces a list of possible realisations that will unambiguously refer (the referring part). Based on the constraints imposed by the referring part, the embedding process finds from the set all the unexpressed facts whose Argls are that entity 4, and makes embedding decisions including what to embed, what syntactic form the embedded parts should take and which realisation for the entity is preferred, according to the principles in the next section. This step iterates until the TS for all facts is built. 3) The aggregation process goes through the TS for parataxis possibilities. 4) The appropriately simplified TS is sent to the surface realiser, where the natural language text is generated.</Paragraph> <Paragraph position="3"> We distinguish between two types of parataxis: semantic and textual. Semantic parataxis concerns facts that have two identical semantic constituents or a rhetorical relation between them, while textual parataxis deals with any adjacent facts from text planning, with no rhetorical connection between. In step 3), both types of parataxis are performed.</Paragraph> </Section> <Section position="3" start_page="1478" end_page="1479" type="metho"> <SectionTitle> 3 Generating the Non-Referring Part </SectionTitle> <Paragraph position="0"> A referring expression is primarily for referring to an entity. So the addition of a non-referring part should not interfere with this primary function. We summarise two principles that the non-referring part must obey, which have been realised in our embedding algorithm in a simple way.</Paragraph> <Paragraph position="1"> the reader about the referent indicated by the referring part. That is, if the referring part can uniquely identify the referent, the reader should not be confused over which object the referring expression is about because of the addition of the non-referring part. For example, in the description of a currently focal object which is a necklace, we might say &quot;The necklace is made from gold&quot;. Suppose we also want to inform the readers that the necklace has floral motifs. We should use &quot;The necklace, which has floral motifs, is made from gold&quot; rather than &quot;The necklace with floral motifs is made from gold&quot; because the latter may make the readers think that the sentence is about a necklace which is not the focal object.</Paragraph> <Paragraph position="2"> Based on both the properties of English and our analysis of real museum descriptions, we find that additional information is provided by evaluative adjectives, non-restrictive clauses, and almost all grammatical constituents in an indefinite and a demonstrative noun phrase. These characteristics are captured by embedding rules. For example, the definition of one rule that embeds a prepositional In the definition, priority is the order in which the rule should be tried, where those rules producing simpler syntactic forms always have higher priority (Scott and de Souza, 1990); constraints is the restrictions that must be satisfied by the predicate and arguments of the embedded fact and the realisation of the referring part. In the above example, the required semantic category of the predicate is specified, which is used to select suitable facts for embedding; RT is the resource tree for building the TS for the embedded component.</Paragraph> <Paragraph position="3"> Assume we have two facts Fl=style(J1, Organic) and F2=hasqual(J1,Floral-motif). Without using embedding, we might generate &quot;The necklace is in the Organic style. It has floral motifs&quot;. Suppose F1 and F2 are selected by the text planner and the embedding process respectively, and the referring form of the entity Jl can be demonstrative, definite or pronoun. Applying the above embedding rule, we would realise F2 as a post-modifier of the Argl of F1, and choose demonstrative, as &quot;This necklace with floral motifs is in the Organic style &quot;.</Paragraph> <Paragraph position="4"> 2. The non-referring part should not reduce the readability of the text. There are several restrictions concerning readability: 1) Complexity of a referring expression: the generated expressions should not be too complex to read. We use a fixed number of syntactic slots to restrict the maximum amount of information that can be expressed. But the actual complexity is decided by user models. At present we only distinguish between adults and children. According to observations in psycholinguistic research, embedded clauses in subjects are a major obstacle to comprehensibility (Coleman, 1962). So for children, the system generates fewer non-restrictive clauses than for adults and none at all in subjects.</Paragraph> <Paragraph position="5"> 2) Compatibility with other aggregation possibilities: only semantic paratactic and hypotactic relations between facts are considered here. Complex embedded components like non-restrictive clauses may interrupt the semantic connection between a set of sentences. For example, if we do not consider such connections while making embedding decisions, we would generate a sentence like: &quot;This jewel is made of gold, sapphire, a kind of precious stone and enamel which is often used to produce a shiny surface&quot;. It is not good compared with: &quot;This jewel is made of gold, sapphire and enamel. Sapphire is a kind of precious stone, and enamel is often used to produce a shiny surface&quot;. Adjectives would not have such negative effect in most cases, especially when the paratactic parts have syntactically symmetrical modifications, like &quot;The bracelet has a slightly flared band and a swelling midsection.&quot; Prepositional phrases fall between adjectives and relative clauses in their effect.</Paragraph> <Paragraph position="6"> Also when one fact is to be embedded, it is necessary to check if there are facts semantically related to it, which should be embedded together. For instance, it is bad to say &quot;The necklace, which is made from gold, is in the Organic style. It is also made from enamel&quot;.</Paragraph> <Paragraph position="7"> So before embedding a fact, our embedding algorithm considers the possibilities of other types of aggregation, and only embeds if the embedded properties can be realised as a syntactic form other than a non-restrictive clause in possible paratactic nuclei, and all of the semantically related facts can be embedded at the same time. This means that embedding has a lower priority than parataxis and hypotaxis, which reflects the relationship between the weakest rhetorical relation, Elaboration, and other types of rhetorical relations.</Paragraph> </Section> <Section position="4" start_page="1479" end_page="1479" type="metho"> <SectionTitle> 4 Future Work </SectionTitle> <Paragraph position="0"> This paper discusses our ongoing work on how to embed new information into a referring expression. While the restrictions concerning the second principle are currently implemented in a procedural way, it is possible to formalise them as constraints within the embedding rules.</Paragraph> <Paragraph position="1"> An interesting problem is the relation between embedding and entity-based coherence, which exists between spans of text in virtue of shared entities (Oberlander et al., 1998). When a fact is embedded into another one, the entity inside it may become unavailable for an entity-based move, and the smooth transfer from this fact to its elaborating facts is cut off. The effect of embedding on local and global coherence is to be exploited more in future work, and a comprehensive evaluation is indispensable.</Paragraph> <Paragraph position="2"> Acknowledgement This research is supported by a University of Edinburgh Studentship. The author appreciates the comments from Dr. Chris Mellish, Dr. Mick O'Donnell and the four anonymous reviewers.</Paragraph> </Section> class="xml-element"></Paper>