XML Viewer - w02-0718

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0718_metho.xml
Size: 9,356 bytes
Last Modified: 2025-10-06 14:08:04
<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0718">
  <Title>The VI framework program in Europe: some thoughts about Speech to Speech Translation research.</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Major Challenges
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Improve significantly the end-to-end per-
</SectionTitle>
      <Paragraph position="0"> formance This is the first challenge to be addressed in the near future. It seems that unified methodologies based on statistical modeling are very promising, provided that some key issues will be afforded and suitable solutions worked out. This methodology allows to include acoustics, phonetic context, speaking rate, speaker variations, language features such as syntax or semantics, etc. into one unified way. Then this approach jointly optimizes acoustics, language and speaker effects. From the modeling point of you it represents quite a shift from the source model. Much more work is needed in proposing new computational tools and building up. This approach is also consistent with the speech synthesis perspective: corpus based and data driven A challenge will also be the exploitation of real applications in a limited domain, i.e. tourism, of systems based on interlingua approaches. Key issues in this case are portability and robustness.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Produce aligned multilingual corpora and
lexica
</SectionTitle>
      <Paragraph position="0"> In order to afford the challenge of developing new models with the hope to improve significantly performance a key issue is given by corpora and lexica. In order to afford the problem of spontaneous speech recognition, there are proposals [14] of collecting and transcribing 5000 hours of spontaneous speech. This issue is controversial; anyhow this is what we have learn from the past experience in speech recognition. The test data could be a mixture of current and new sources. For translation aligned multilingual text corpora are also crucial.</Paragraph>
      <Paragraph position="1"> An effort is going on in a joint cooperation with ATR and IRST and with the other member of C-STAR III consortium in order to set up an aligned text corpora composed by the transcription and translation of phrase book in the tourism domain.</Paragraph>
      <Paragraph position="2"> This phrase book cover a broad range of situations: emergency, time table, transport, sightseeing, directions, attractions, hotels, shopping...Aligned multilingual lexical are also important language resources for future S2ST systems development. A current activity is under development in LC-STAR [15] a new funded project in the Vth framework by EU.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Integrate speech to speech translation
</SectionTitle>
      <Paragraph position="0"> components in a real applications Real services and application involving speech communication need to manage the &amp;quot;interface problem&amp;quot;, i.e. the physical impact of the user with a device which involves multimodal, multi-media in a ubiquitous environment. A wearable device, a PDA or 3G cellular cannot be operated by keyboard, and requires sophisticated natural multimodal human interfaces. Speech, vision and handwriting seem natural candidates for human-machine interaction. But how can a system provide seamless integration between human-machine services and human-human services? How can the system blend the two, provide assistance and guidance for a user to access and understand databases and information resources, but also to serve as a go-between to facilitate the interaction with other humans or with a user's direct environment?</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 A new action in Europe
</SectionTitle>
    <Paragraph position="0"> Given the challenges previously discussed and the experience carried on in the previous and ongoing projects a new and innovative initiative is needed to tackle to problem. This initiative in order to be successful need first of all a critical mass of researchers. Within Europe few research groups have the capability to build up complete SST systems.</Paragraph>
    <Paragraph position="1"> Most research groups are small and work only on some research themes, i.e prosody, acoustic modeling, language modeling, speech synthesis. Although these small groups may have excellent researchers, their work has less impact on the development of SST-components. This new initiative should provide an appropriate infrastructure to use in a effective way the intellectual potential of European researchers. Given the big shift needed in order to set up this new action, a group of European major players in the spoken language technology, both research institutions, industrial entities, and ELDA proposed a preparatory action, which acronym is TC-STAR_P (Technology and Corpora for speech translation).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Goals and activities.
</SectionTitle>
      <Paragraph position="0"> The preparatory action, under negotiation, fits with the action line IST2002-III.5.2 c) &amp;quot;preparing for future research activities&amp;quot;. It is scheduled to begin in July 2002. The duration will be one year with the purpose of preparing and getting ready an integrated project for the VI Framework. An integrated project as is a large scale action with the purpose to create the European Research Area, ERA. The activity of the TC-STAR_P will be carried on by the cooperation of the four groups: an industrial group, with proven experience in SST technology development, a research group, with proven experience in research in SST-technologies, an infrastructure group, with proven experience in producing language resources for SST components and with proven experience of evaluation of SST components and systems. Then a dissemination group will be in charge of using and spreading the project's results Three are the main goals of this action: * developing research roadmaps and associated implementation models * identifying and bringing together all relevant actors in the Speech to Speech  implementation models The consortium is composed of different RTD communities: industrial, academics, and infrastructure entities. All these organizations will contribute to develop common visions and analyze research requirements for SST systems. As a result of these tasks, industrial partners will prepare roadmaps for technical implementations and services; the scientific and academic groups will prepare roadmaps for technology improvements; and the infrastructure group will provide roadmaps for LR-production and evaluation campaigns.</Paragraph>
      <Paragraph position="1"> The work will include a case study where industrial partners and research partners will provide application-oriented and research input respectively. The infrastructure group will focus on preparatory tasks for setting up production, evaluation and validation centers for the needed LR.</Paragraph>
      <Paragraph position="2">  vant actors The consortium includes some of the most relevant actors in the SST field. One of the objectives during the lifetime of the project is to attract further key actors from the industrial, research and infrastructure groups, as well as SMEs working with SST applications and related fields.</Paragraph>
      <Paragraph position="3"> Within the infrastructure group, a key action is to attract and prepare contacts with national agencies for funding language specific LR-production in the future FP6, and with entities working on evaluation and validation of language resources. The development of language resources is a very expensive activity, which must be best tackled by coordinated funding actions at national and European levels.</Paragraph>
      <Paragraph position="4"> 4.1.3 Investigating a new management model According to the IST 2002 Work programme, Action Line 3.5.2 should focus on building and strengthening RTD communities by encouraging research, business and user organisations to develop together common visions and analyse research requirements in order to identify common challenges and objectives; and on investigating effective mechanisms for managing future activities. null Moreover, a cornerstone of the future work to be developed under the Integrated Project is the management structure. In accordance with Action Line 3.5.2., the work to be performed under TC-STAR_P includes exploring a new organizational model in order to allow partners to smoothly collaborate in pursuing the final goal. This important task will be investigated during the project. Issues such as distribution of work and resources, admission and withdrawal of participants, engagement of additional parties, scientific guidance and monitoring, etc. will be examined. The model has to be effective to reach the envisaged goal, to react to external new trends, needs and demands coming from the market, society and scientific community Section 2</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML