XML Viewer - n04-3010

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/n04-3010_metho.xml
Size: 2,638 bytes
Last Modified: 2025-10-06 14:08:55
<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-3010">
  <Title>A THAI SPEECH TRANSLATION SYSTEM FOR MEDICAL DIALOGS</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5. Speech Synthesis
</SectionTitle>
    <Paragraph position="0"> First, we built a limited domain Thai voice in the Festival Speech Synthesis System [1]. Limited Domain voices can achieve very high quality voice output [2], and can be easy to construct if the domain is constrained. Our initial voice targeted the Hotel Reservation domain and we constructed 235 sentence that covered the aspects of our immediate interest. Using the tools provided in FestVox [1], we recorded, auto-labeled, and built a synthetic voice.</Paragraph>
    <Paragraph position="1"> In supporting any new language in synthesis, a number of language specific issues first had to be addressed. As with our other speech-to-speech translation projects we share the phoneme set between the recognizer and the synthesizer. The second important component is the lexicon. The pronunciation of Thai words from Thai script is not straightforward, but there is a stronger relationship between the orthography and pronunciation than in English. For this small set of initial words we constructed an explicit lexicon by hand with the output vocabulary of 522 words. The complete Thai limited domain voice uses unit selection concatenative synthesis. Unlike our other limited domain synthesizers, where they have a limited vocabulary, we tag each phone with syllable and tone information in selection making the result more fluent, and a little more general.</Paragraph>
    <Paragraph position="2"> Building on our previous Thai work in pronunciation of Thai words [3], we have used the lexicon and statistically trained letter to sound rules to bootstrap the required word coverage. With a pronunciation model we can select suitable phonetically balanced text (both general and indomain) from which we are able to record and build a more general voice.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6. Demonstration Prototype System
</SectionTitle>
    <Paragraph position="0"> Our current version is a two-way speech-to-speech translation system between Thai and English for dialogs in the medical domain where the English speaker is a doctor and the Thai speaker is a patient. The translated speech input will be spoken using the built voice. At the moment, the coverage is very limited due to the simplicity of the used grammars. The figure shows the interface of our prototype system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML