File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/05/w05-1602_relat.xml
Size: 8,904 bytes
Last Modified: 2025-10-06 14:15:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1602"> <Title>Interactive Authoring of Logical Forms for Multilingual Generation/</Title> <Section position="3" start_page="0" end_page="0" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> Our work starts from several related research traditions: multilingual generation systems; WYSIWYM systems; knowledge and ontology editors. We review these in this section in turn.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Multilingual Generation </SectionTitle> <Paragraph position="0"> Multilingual texts generation (MLG) is a well motivated method for the automatic production of technical documents in multiple languages. The benefits of MLG over translation from single language source were documented in the past and include the high cost of human translation and the inaccuracy of automatic machine translation [Stede, 1996], [Coch, 1998], [Bateman, 1997]. In an MLG system, users enter data in an interlingua, from which the target languages are generated. null MLG Systems aim to be as domain independent as possible (since development is expensive) but usually refer to a narrow domain, since the design of the interlingua refers to domain information. MLG systems share a common architecture consisting of the following modules: + A language-independent underlying knowledge representation: knowledge represented as AI plans [R&quot;osner and Stede, 1994] [Delin et al., 1994], [Paris and Vander Linden, 1996], knowledge bases (or ontologies) such as LOOM, the Penman Upper-model and other (domain-specific) concepts and instances [R&quot;osner and Stede, 1994].</Paragraph> <Paragraph position="1"> + Micro-structure planning (rhetorical structure) - language independent - this is usually done by the human writers using the MLG application GUI.</Paragraph> <Paragraph position="2"> + Sentence planning - different languages can express the same content in various rhetorical structures, and planning must take it into consideration: either by avoiding the tailoring of structure to a specific language [R&quot;osner and Stede, 1994] or by taking advantage of knowledge on different realizations of rhetorical structures in different languages at the underlying representation [Delin et al., 1994].</Paragraph> <Paragraph position="3"> + Lexical and syntactic realization resources (e.g., English PENMAN/German NIGEL in [R&quot;osner and Stede, As an MLG system, our system also includes similar modules. We have chosen to use Conceptual Graphs as an inter-lingua for encoding document data [Sowa, 1987]. We use existing generation resources for English - SURGE [Elhadad, 1992] for syntactic realization and the lexical chooser described in [Jing et al., 2000] and the HUGG grammar for syntactic realization in Hebrew [Netzer, 1997]. For microplanning, we have implemented the algorithm for reference planning described in [Reiter and Dale, 1992] and the aggregation algorithm described in [Shaw, 1995]. The NLG components rely on the C-FUF implementation of the FUF language [Kharitonov, 1999] [Elhadad, 1991] - which is fast enough to be used interactively in realtime for every single editing modification of the semantic input.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 WYSIWYM </SectionTitle> <Paragraph position="0"> In an influential series of papers [Power and Scott, 1998], WYSIWYM (What You See Is What You Mean) was proposed as a method for the authoring of semantic information through direct manipulation of structures rendered in natural language text. A WYSIWYM editor enables the user to edit information at the semantic level. The semantic level is a direct controlled feature, and all lower levels which are derived from it, are considered as presentational features. While editing content, the user gets a feedback text and a graphical representation of the semantic network. These representations can be interactively edited, as the visible data is linked back to the underlying knowledge representation.</Paragraph> <Paragraph position="1"> Using this method, a domain expert produces data by editing the data itself in a formal way, using a tool that requires only knowledge of the writer's natural language. Knowledge editing requires less training, and the natural language feed-back strengthens the confidence of users in the validity of the documents they prepare.</Paragraph> <Paragraph position="2"> The system we have developed belongs to the WYSIWYM family. The key aspects of the WYSIWYM method we investigate are the editing of the semantic information. Text is generated as a feedback for every single editing operation.</Paragraph> <Paragraph position="3"> Specifically, we evaluate how ontological information helps speed up semantic data editing.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Controlled Languages </SectionTitle> <Paragraph position="0"> A way to ensure that natural language text is unambiguous and &quot;easy to process&quot; is to constrain its linguistic form. Researchers have designed &quot;controlled languages&quot; to ensure that words in a limited vocabulary and simple syntactic structures are used (see for example [Pulman, 1996]). This notion is related to that of sublanguage [Kittredge and Lehrberger, 1982], which has been used to analyze and generate text in specific domains such as weather reports.</Paragraph> <Paragraph position="1"> With advances in robust methods for text analysis, it is becoming possible to parse text with high accuracy and recover partial semantic information. For example, the DIRT system [Lin and Pantel, 2001] recovers thematic structures from free text in specific domains. Combined with lexical resources (WordNet [Miller, 1995] and Verbnet [Kipper et al., 2000]), it is now possible to confirm the thesis that controlled languages are easy to process automatically.</Paragraph> <Paragraph position="2"> Complete semantic interpretation of text remains however too difficult for current systems. In our system, we rely on automatic interpretation of text samples in a specific sublanguage to assist in the acquisition of a domain-specific ontology, as described below.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.4 Graphical Editors for Logical Forms </SectionTitle> <Paragraph position="0"> Since many semantic encodings are described as graphs, knowledge editing tools have traditionally been proposed as graphical editors - where concepts are represented as nodes and relations as edges. Such a &quot;generic graphical editor&quot; is presented for example in [Paley et al., 1997].</Paragraph> <Paragraph position="1"> Conceptual graphs have also been traditionally represented graphically, and there is a standard graphical encoding for CGs. Graphical editors for CGs are available (e.g., [Delugach, 2001]).</Paragraph> <Paragraph position="2"> While graphical editors are attractive, they suffer from known problems of visual languages: they do not scale well (large networks are particularly difficult to edit and understand). Editing graphical representations is often slower than editing textual representations. Finally, graphical representations convey too much information, as non-meaningful data may be inferred from graphical features such as layout of font, which is not constrained by the underlying visual language. null</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.5 Generation from CG </SectionTitle> <Paragraph position="0"> CGs have been used as an input to text generation in a variety of systems in the past [Cote and Moulin, 1990], [Bontcheva, 1995] and others.</Paragraph> <Paragraph position="1"> In our work, we do not view the CG level as a direct input to a generation system. Instead, we view the CG level as an ontological representation, lacking communicative intention levels, and not linked directly to linguistic considerations. The CG level is justified by its inferencing and query retrieval capabilities, while taking into account sets, quantification and nested contexts.</Paragraph> <Paragraph position="2"> Processing is required to link the CG representation level (see Fig. 1) to linguistically motivated rhetorical structures, sentence planning and lexical choice. In our work, CGs are formally converted to an input to a generation system by a text planner and a lexical chooser, as described below. Existing generation components for lexical choice and syntactic realization based on functional unification are used on the output of the text planner.</Paragraph> </Section> </Section> class="xml-element"></Paper>