File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/h01-1007_metho.xml

Size: 8,048 bytes

Last Modified: 2025-10-06 14:07:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1007">
  <Title>Architecture and Design Considerations in NESPOLE!: a Speech Translation System for E-commerce Applications</Title>
  <Section position="2" start_page="1" end_page="1" type="metho">
    <SectionTitle>
2. NESPOLE! INTERLINGUA-BASED
TRANSLATION APPROACH
</SectionTitle>
    <Paragraph position="0"> Our translation approach builds upon previous work that we have conducted within the context of the C-STAR consortium. We use an interlingua-based approach with a relatively shallow task-oriented interlingua representation [2] [1], that was initially designed for the C-STAR consortium and has been significantly extended for the NESPOLE! project. Interlingual machine translation is convenient when more than two languages are involved because it does not require each language to be connected by a set of transfer rules to each other language in each direction [3]. Adding a new language that has all-ways translation with existing languages requires only writing one analyzer that maps utterances into the interlingua and one generator that maps interlingua representations into sentences.</Paragraph>
    <Paragraph position="1"> The interlingua approach also allows each partner group to imple- null ther advantage is that it supports a paraphrase generation back into the language of the speaker. This provides the user with some control in case the analysis of an utterance failed to produce a correct interlingua. The following are three examples of utterances tagged with their corresponding interlingua representation: Thank you very much c:thank And we'll see you on February twelfth.</Paragraph>
    <Paragraph position="2"> a:closing (time=(february, md12)) On the twelfth we have a single and a double available.</Paragraph>
    <Paragraph position="4"/>
  </Section>
  <Section position="3" start_page="1" end_page="1" type="metho">
    <SectionTitle>
3. NESPOLE! SYSTEM ARCHITECTURE
DESIGN
</SectionTitle>
    <Paragraph position="0"> Several main considerations were taken into account in the design of the NESPOLE! Human Language Technology (HLT) server architecture: (1) The desire to cleanly separate the actual HLT system from the communication channel between the two parties, which makes use of the speech translation capabilities provided by the HLT system; (2) The desire to allow each research site to independently develop its language specific analysis and generation modules, and to allow each site to easily integrate new and improved components into the global NESPOLE! HLT system; and (3) The desire of the research partners to build to whatever extent possible upon software components previously developed in the context of the C-STAR consortium. We will discuss the extent to which the designed architecture achieves these goals after presenting an overview of the architecture itself.</Paragraph>
    <Paragraph position="1"> Figure 1 shows the general architecture of the current NESPOLE! system. Communication between the client and agent is facilitated by a dedicated module - the Mediator. This module is designed to control the video-conferencing connection between the client and the agent, and to integrate the speech translation services into the communication. The mediator handles audio and video data associated with the video-conferencing application and binary data associated with a shared whiteboard application. Standard H.323 data formats are used for these three types of data transfer. Speech-to-speech translation of the utterances captured by the mediator is accomplished through communication with the NESPOLE! global HLT server. This is accomplished via socket connections with language-specific HLT servers. The communication between the mediator and each HLT server consists mainly of linear PCM audio packets (some text and control messages are also supported and are described later in this section).</Paragraph>
    <Paragraph position="2">  The global NESPOLE! HLT server comprises four separate language-specific servers. Additional language-specific HLT servers can easily be integrated in the future. The internal architecture of each language-specific HLT server is shown in figure 2. Each language-specific HLT server consists of an analysis chain and a generation chain. The analysis chain receives an audio stream corresponding to a single utterance and performs speech recognition followed by parsing and analysis of the input utterance into the interlingua representation (IF). The interlingua is then transmitted to a central HLT communication switch (the CS), that forwards it to the HLT servers for the other languagesas appropriate. IF messages received from the central communication switch are processed by the generation chain. A generation module first generates text in the target language from the IF. The text utterance is then sent to a speech synthesis module that produces an audio stream for the utterance. The audio is then communicated externally to the mediator, in order to be integrated back into the video-conferencing stream between the two parties.</Paragraph>
    <Paragraph position="3"> The mediator can, in principle, support multiple one-to-one communication sessions between client and agent. However, the design supports multiple mediators, which, for example, could each be dedicated to a different provider application. Communication with the mediator is initiated by the client by an explicit action via the web browser. This opens a communication channel to the mediator, which contacts the agent station, establishes the video-conferencing connection between client and agent, and starts the whiteboard application. The specific pair of languages for a dialogue is determined in advance from the web page from which the client initiates the communication. The mediator then establishes a socket communication channel with the two appropriate language specific HLT servers. Communication between the two language specific HLT servers, in the form of IF messages, is facilitated by the NESPOLE! global communication switch (the CS). The language specific HLT servers may in fact be physically distributed over the internet. Each language specific HLT server is set to service analysis requests coming from the mediator side, and generation requests arriving from the CS.</Paragraph>
    <Paragraph position="4"> Some further functionality beyond that described above is also supported. As described earlier, the ability to produce a textual paraphrase of an input utterance and to display it back to the original speaker provides useful user control in the case of translation failures. This is supported in our system in the following way. In addition to the translated audio, each HLT server also forwards the generated text in the output language to the mediator, which then displays the text on a dedicated application window on the PC of the target user. Additionally, at the end of the processing of an input utterance by the analysis chain of an HLT server, the resulting IF is passed internally to the generation chain, which produces a text generation from the IF. The result is a textual paraphrase of the input utterance in the source language. This text is then sent back to the mediator, which forwards it to the party from which the utterance originated. The paraphrase is then displayed to the original speaker in the dedicated application window. If the paraphrase is wrong, it is likely that the produced IF was incorrect, and thus the translation would also be wrong. The user may then use a button on the application interface to signal that the last displayed paraphrase was wrong. This action triggers a message that is forwarded by the mediator to the other party, indicating that the last displayed translation should be ignored. Further functionality is planned to support synchronization between multi-modal events on the whiteboard and their corresponding speech actions. As these are in very preliminary stages of planning we do not describe them here.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML