XML Viewer - h01-1055

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/h01-1055_evalu.xml
Size: 2,486 bytes
Last Modified: 2025-10-06 13:58:41
<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1055">
  <Title>RealizerSentencePlannerText Manager Dialog Natural Language Generation Planner Prosody Utterance User Utterance System Assigner TTS Natural Language Understanding ASR</Title>
  <Section position="6" start_page="2" end_page="2" type="evalu">
    <SectionTitle>
5. REALIZER
</SectionTitle>
    <Paragraph position="0"> At the level of the surface language, the difference in communicative intention between human-human travel advisory dialogs and the intended dialogs is not as relevant: we can try and mimic the human-human transcripts as closely as possible. To show this, we have performed some initial experiments using FERGUS (Flexible Empiricist-Rationalist Generation Using Syntax), a stochastic surface realizer which incorporates a tree model and a linear language model [2]. We have developed a metric which can be computed automatically from the syntactic dependency structure of the sentence and the linear order chosen by the realizer, and we have shown that this metric correlates with human judgments of the felicity of the sentence [3]. Using this metric, we have shown that the use of both the tree model and the linear language model improves the quality of the output of FERGUS over the use of only one or the other of these resources.</Paragraph>
    <Paragraph position="1"> FERGUS was originally trained on the Penn Tree Bank corpus consisting of Wall Street Journal text (WSJ). The results on an initial set of Communicator sentences were not encouraging, presumably because there are few questions in the WSJ corpus, and furthermore, specific constructions (including what as determiner) appear to be completely absent (perhaps due to a newspaper style file). In an initial experiment, we replaced the linear language model (LM) trained on 1 million words of WSJ by an LM trained on 10,000 words of human-human travel planning dialogs collected at CMU. This resulted in a dramatic improvement, with almost all questions being generated correctly. Since the CMU corpus is relatively small for a LM, we intend to experiment with finding the ideal combination of WSJ and CMU corpora. Furthermore, we are currently in the process of syntactically annotating the CMU corpus so that we can derive a tree model as well. We expect further improvements in quality of the output, and we expect to be able to exploit the kind of limited lexical variation allowed by the tree model [1].</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML