File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2061_intro.xml

Size: 4,717 bytes

Last Modified: 2025-10-06 14:03:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2061">
  <Title>Integration of Speech to Computer-Assisted Translation Using Finite-State Automata</Title>
  <Section position="3" start_page="0" end_page="467" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> A desired feature of computer-assisted translation (CAT) systems is the integration of the human speech into the system, as skilled human translators are faster at dictating than typing the translations (Brown et al., 1994). Additionally, incorporation of a statistical prediction engine, i.e.</Paragraph>
    <Paragraph position="1"> a statistical interactive machine translation system, to the CAT system is another useful feature. A statistical prediction engine provides the completions to what a human translator types (Foster et al., 1997; Och et al., 2003). Then, one possible procedure for skilled human translators is to provide the oral translation of a given source text and then to post-edit the recognized text. In the post-editing step, a prediction engine helps to decrease the amount of human interaction (Och et al., 2003).</Paragraph>
    <Paragraph position="2"> In a CAT system with integrated speech, two sources of information are available to recognize the speech input: the target language speech and the given source language text. The target language speech is a human-produced translation of the source language text. Statistical machine translation (MT) models are employed to take into account the source text for increasing the accuracy of automatic speech recognition (ASR) models.</Paragraph>
    <Section position="1" start_page="0" end_page="467" type="sub_section">
      <SectionTitle>
Related Work
</SectionTitle>
      <Paragraph position="0"> The idea of incorporating ASR and MT models was independently initiated by two groups: researchers at IBM (Brown et al., 1994), and researchers involved in the TransTalk project (Dymetman et al., 1994; Brousseau et al., 1995). In (Brown et al., 1994), the authors proposed a method to integrate the IBM translation model 2 (Brown et al., 1993) with an ASR system. The main idea was to design a language model (LM) to combine the trigram language model probability with the translation probability for each target word. They reported a perplexity reduction, but no recognition results. In the TransTalk project, the authors improved the ASR performance by rescoring the ASR N-best lists with a translation model. They also introduced the idea of a dynamic vocabulary for a speech recognition system where translation models were generated for each source language sentence. The better performing of the two is the N-best rescoring.</Paragraph>
      <Paragraph position="1"> Recently, (Khadivi et al., 2005) and (Paulik et al., 2005a; Paulik et al., 2005b) have studied the integration of ASR and MT models. The first work showed a detailed analysis of the effect of different MT models on rescoring the ASR N-best lists. The other two works considered two parallel N-best lists, generated by MT and ASR systems,  respectively. They showed improvement in the ASR N-best rescoring when some proposed features are extracted from the MT N-best list. The main concept among all features was to generate different kinds of language models from the MT N-best list.</Paragraph>
      <Paragraph position="2"> All of the above methods are based on an N-best rescoring approach. In this paper, we study different methods for integrating MT models to ASR word graphs instead of N-best list. We consider ASR word graphs as finite-state automata (FSA), then the integration of MT models to ASR word graphs can benefit from FSA algorithms.</Paragraph>
      <Paragraph position="3"> The ASR word graphs are a compact representation of possible recognition hypotheses. Thus, the integration of MT models to ASR word graphs can be considered as an N-best rescoring but with very large value for N. Another advantage of working with ASR word graphs is the capability to pass on the word graphs for further processing. For instance, the resulting word graph can be used in the prediction engine of a CAT system (Och et al., 2003).</Paragraph>
      <Paragraph position="4"> The remaining part is structured as follows: in Section 2, a general model for an automatic text dictation system in the computer-assisted translation framework will be described. In Section 3, the details of the machine translation system and the speech recognition system along with the language model will be explained. In Section 4, different methods for integrating MT models into ASR models will be described, and also the experimental results will be shown in the same section.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML