File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3708_metho.xml
Size: 4,676 bytes
Last Modified: 2025-10-06 14:11:02
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-3708"> <Title>S-MINDS 2-Way Speech-to-Speech Translation System</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 System Description </SectionTitle> <Paragraph position="0"> This section describes the speech recognition, translation, speech generation, interface and hardware components that make up S-MINDS.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Speech Recognition </SectionTitle> <Paragraph position="0"> S-MINDS uses a number of voice-independent automated speech recognition (ASR) engines, with the usage dependent on the languages and the particular domain. These engines include</Paragraph> <Paragraph position="2"> Sehda's (internal) dialog/translation creation tools allow developers to compile and run new dialogs with any ASR engine so they do not have to be encumbered by the nuances of any particular engine.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Translation </SectionTitle> <Paragraph position="0"> S-MINDS processes ASR output using a combination of grammars and language models that is selected based on the task and the availability of training data.</Paragraph> <Paragraph position="1"> S-MINDS first employs a semantic parser to extract the essential words and phrases from the ASR output. This information is then fed into Sehda's proprietary interpretation engine, which matches the information against a finite set of concepts in the specified domain. The resulting translation is extremely accurate - often more accurate than the ASR output itself. However, as the name suggests, this engine does not directly translate users' utterances but interprets what they say and paraphrases their statements.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Speech Generation </SectionTitle> <Paragraph position="0"> S-MINDS uses its own voice generation system, which splices human recordings, to output most translations. If recordings do not exist for a word or phrase, S-MINDS generates the speech using a text-to-speech (TTS) engine.</Paragraph> <Paragraph position="1"> S-MINDS includes a set of tools by which users can modify and augment the existing system with additional words and phrases in the field in a matter of a few minutes.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.4 Interface </SectionTitle> <Paragraph position="0"> A variety of interface features make S-MINDS particularly easy to use in a hospital environment.</Paragraph> <Paragraph position="1"> Most S-MINDS functions can be performed hands-free and eyes-free via a voice user interface (VUI) so the provider can focus on the patient and the operation of hospital equipment. A picture viewer allows digital images to be displayed to aid communication with the patient and add clarity to the log.</Paragraph> <Paragraph position="2"> Verbal or on-screen verification can be employed (with adjustable upper and lower thresholds) to put an additional check on recognition accuracy.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.5 Hardware </SectionTitle> <Paragraph position="0"> A complete S-MINDS system contains three main hardware components: a Windows XP computer with S-MINDS software installed; a headset microphone, which the healthcare provider uses to control S-MINDS and communicate with the patient; and a telephone handset, which the patient uses to communicate with the provider.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Current Developments </SectionTitle> <Paragraph position="0"> Under a contract with DARPA, Sehda has developed a more interactive system using a combination of SMT and interpretation engines. This allows the users to speak more freely. If an utterance is too complex or too far 'out of domain' to be handled by the interpretation engine, S-MINDS falls back to the SMT engine, which returns a fairly reliable word-for-word translation of the ASR output.</Paragraph> <Paragraph position="1"> The VUI in S-MINDS is being enhanced to include nearly all system control functions, reducing the need to change settings manually. In addition, users will be able to deliver some urgent expressions (such as &quot;Hold still&quot; or &quot;You can breathe&quot;) instantaneously without saying a 'hotword' first.</Paragraph> </Section> class="xml-element"></Paper>