File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/w00-0310_concl.xml
Size: 2,472 bytes
Last Modified: 2025-10-06 13:52:49
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0310"> <Title>Using Dialogue Representations for Concept-to-Speech Generation</Title> <Section position="8" start_page="51" end_page="52" type="concl"> <SectionTitle> 6 Conclusion and Future Work </SectionTitle> <Paragraph position="0"> We are presently carrying out evaluations of MIMIC-CTS. An initial corpus-based analysis compares the prosodic annotations assigned to three actual MIMIC dialogues, which were previously collected during an overall system evaluation (Chu-Carroll and Nickerson, 2000). The corpus of dialogues is made up of 37 system/user turns, including 40 system-generated sentences. Three versions of the MIMIC dialogues are being analysed, with prosodic features arising from three differ- null ent sources: MIMIC-CTS, MIMIC operating with default Bell Labs TTS, and a professional voice talent who read the dialogue scripts in context.</Paragraph> <Paragraph position="1"> This corpus-based assessment -- comparing the prosody of CTS-generated, TTS-generated, and human speech, will enable more domain-dependent tuning of the MIMIC-CTS algorithms, as well as the refinement of general prosodic patterns for linguistic structures, such as lists and conjunctive phrases. Ultimately; the value of MIMIC-CTS must be measured based on its contribution to overall task peformance by real MIMIC users. Such a study is under design, following (Chu-Carroll and Nickerson, 2000).</Paragraph> <Paragraph position="2"> In conclusion, we have shown how prosodic computation can be conditioned on various dialogue representations, for robust and domain-independent CTS synthesis. -While some rules for prosody assignment depend on the task model, others must be tied closely to the particular choices of content in the replies, at the level of dialogue goals and dialogue acts. At this level as well, however, linguistic principles of intonation interpretation can be applied to determine the mappings. In sum, the lesson learned is that a unitary notion of &quot;concept&quot; from which we generate a unitary prosodic structure, does not apply to state-of-the-art spoken dialogue generation. Instead, the representation of dialogue meaning in experimental architectures, such as MIMIC's, is compositional to some degree, and we take advantage of this fact to implement a compositional theory of intonational meaning in a new concept-to-speech system, MIMIC-CTS.</Paragraph> </Section> class="xml-element"></Paper>