File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/n03-1005_concl.xml

Size: 3,283 bytes

Last Modified: 2025-10-06 13:53:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-1005">
  <Title>Automatic Acquisition of Names Using Speak and Spell Mode in Spoken Dialogue Systems</Title>
  <Section position="8" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> This paper has described a methodology and implementation for automatically acquiring user names in the ORION task delegation system. It has been shown that a novel multi-stage recognition procedure can handle an open set of names, given waveforms with the spoken name followed by the spelled letters. The overall system is also capable of incorporating the new name immediately into its language and lexical models, following the dialogue.</Paragraph>
    <Paragraph position="1"> Future work is needed on many parts of the system. As more data are collected, future experiments will be conducted with larger test sets. We can improve the letter recognizer by explicitly modeling the transition between the unknown word and the spelling component. For instance, by adding prosodic features we may be able to improve the detection of the onset of the spelling part.</Paragraph>
    <Paragraph position="2"> Our final selection process is based only on the proposed spellings obtained from the pronounced word, after feeding information from the spelled part into the second stage. However, performance may improve if we apply a strict constraint during the search, explicitly allowing only paths where the spoken and spelled part of the waveforms agree on the name spelling. Alternatively, a length constraint can be imposed on the letter sequence, once it has been observed that the second stage hypotheses for the spoken and the spelled components are inconsistent in length.</Paragraph>
    <Paragraph position="3"> As an unconstrained name recognizer, the system described here handles in the same way both observed data and previously unseen data. We would like to experiment with adding a parallel component that explicitly models some in-vocabulary words. This may boost overall accuracy by lexicalizing the most common names, such that only words that are identified as OOV need to be processed by the ANGIE sound-to-letter stage.</Paragraph>
    <Paragraph position="4"> In regards to implementation, the current hub-server configuration has allowed us to rapidly implement the system and conduct experiments. The multi-threaded approach, implemented using the hub scripting language, has been effective in allowing a smooth dialogue to proceed while the multi-stage processing takes place in the background. However, we anticipate that the multi-stage approach can be improved by folding all three stages into a single recognition server, eventually allowing real-time operation. In this case, multi-threading would only be needed for the final stage that incorporates the new words into the on-line system.</Paragraph>
    <Paragraph position="5"> The long-term objective of this work is to learn the pronunciations and spellings of general OOV data in spoken dialogue systems on domains where OOV proper nouns are prevalent. Future experiments will involve general classes of unknown words such as names of geographical locations or businesses.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML