File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-1070_intro.xml

Size: 4,381 bytes

Last Modified: 2025-10-06 14:03:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1070">
  <Title>Instance-based Sentence Boundary Determination by Optimization for Natural Language Generation</Title>
  <Section position="2" start_page="0" end_page="565" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The problem of sentence boundary determination in natural language generation exists when more than one sentence is needed to convey multiple concepts and propositions. In the classic natural language generation (NLG) architecture (Reiter, 1994), sentence boundary decisions are made during the sentence planning stage in which the syntactic structure and wording of sentences are decided. Sentence boundary determination is a complex process that directly impacts a sentence's readability (Gunning, 1952), its semantic cohesion, its syntactic and lexical realizability, and its smoothness between sentence transitions. Sentences that are too complex are hard to understand, so are sentences lacking semantic cohesion and cross-sentence coherence. Further more, bad sentence boundary decisions may even make sentences unrealizable.</Paragraph>
    <Paragraph position="1"> To design a sentence boundary determination method that addresses these issues, we employ an instance-based approach (Varges and Mellish, 2001; Pan and Shaw, 2004). Because we optimize our solutions based on examples in a corpus, the output sentences can demonstrate properties, such as similar sentence length distribution and semantic grouping similar to those in the corpus. Our approach also avoids problematic sentence boundaries by optimizing the solutions using all the instances in the corpus. By taking a sentence's lexical and syntactic realizability into consideration, it can also avoid sentence realization failures caused by bad sentence boundary decisions. Moreover, since our solution can be adapted easily to suit the capability of a natural language generator, we can easily tune the algorithm to maximize the generation quality. To the best of our knowledge, there is no existing comprehensive solution that is domain-independent and possesses all the above qualities. In summary, our work  offers three significant contributions: 1. It provides a general and flexible sentence  boundary determination framework which takes a comprehensive set of sentence complexity and quality related criteria into consideration and ensures that the proposed algorithm is sensitive to not only the complexity of the generated sentences, but also their semantic cohesion, multi-sentence coherence and syntactic and lexical realizability.</Paragraph>
    <Paragraph position="2"> 2. Since we employ an instance-based method, the proposed solution is sensitive to the style of the sentences in the application domain in which the corpus is collected.</Paragraph>
    <Paragraph position="3"> 3. Our approach can be adjusted easily to suit a sentence generation system's capability and avoid some of its known weaknesses.</Paragraph>
    <Paragraph position="4"> Currently, our work is embodied in a multimodal conversation application in the real-estate domain in which potential home buyers interact with the system using multiple modalities, such as speech and gesture, to request residential real-estate information (Zhou and Pan, 2001; Zhou and Chen, 2003; Zhou and Aggarwal, 2004). After interpreting the request, the system formulates a multimedia presentation, including automatically generated speech and graphics, as the response (Zhou and Aggarwal, 2004). The proposed sentence boundary determination module takes a set of propositions selected by a content planner and passes the sentence boundary decisions to SEGUE (Pan and Shaw, 2004), an instance-based sentence generator, to formulate the final sentences. For example, our system is called upon to generate responses to a user's request: &amp;quot;Tell me more about this house.&amp;quot; Even though not all of the main attributes of a house (more than 20) will be conveyed, it is clear that a good sentence boundary determination module can greatly ease the generation process and improve the quality of the output. In the rest of the paper, we start with a discussion of related work, and then describe our instance-base approach to sentence boundary determination. Finally, we present our evaluation results.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML