File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/h91-1003_intro.xml
Size: 4,076 bytes
Last Modified: 2025-10-06 14:05:00
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1003"> <Title>THE ESPRIT PROJECT POLYGLOT</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> Polyglot is a 16.5 million ECU (i.e., approximately $23 million) project that is funded by the European Community as part of the ESPRIT-2 program. As is usual in ESPRIT, the European Community covers 50% of the total costs; the other half of the cost is paid by the partners in the Polyglot Consortium. In terms of manpower the resources amount to a total of some 133 man years. The project started in August 1989. It was approved for a duration of three years. Originally, a workplan spanning five years was submitted, so considerable cuts in the plans were necessary. An attempt will be made to obtain new ESPRIT funding for a continuation project that will probably go under the name Polyglot-2.</Paragraph> <Paragraph position="1"> Polyglot builds partly on the results of a previous ESPRIT project that was titled &quot;Linguistic Analysis of European Languages&quot; \[1\]. In that predecessor project the attention was mainly focused on the acquisition of databases and statistical knowledge about the seven European languages that are being investigated in Polyglot. In alphabetical order these languages are British-English, Dutch, French, German, Greek, Italian and Spanish. The data and knowledge acquired in that project were a.o. used to build grapheme-to-phoneme and phoneme-to-grapheme conversion modules for the seven languages. Of course, the phoneme-to-grapheme conversion modules required the development of language models; for that goal Markov models based on Part-of-Speech information were developed.</Paragraph> <Paragraph position="2"> Since it is necessary to have at least one partner in each of the seven language communities the Polyglot Consortium is necessarily quite large; at this moment it consists of the following partners (there have been some modifications in the past): The work in Polyglot is structured in two ways. First there are five Work Packages (WP), one dealing with Isolated Word Speech Recognition (IWSR), one with Continuous Speech Recognition (CSR), one with Text-to-Speech Conversion (TTS), one with Applications (APP) and one with Common Tasks (COT). Perpendicular to this structuring based on technologies there is another organizing principle, viz. the distinction between Language Dependent and Language Independent work \[2\]. Polyglot aims ~ the development of Language Independent frameworks in which Language Dependent knowledge and data can be integrated in order to build homogeneously structured multi-lingual speech systems.</Paragraph> <Paragraph position="3"> In this paper the five Work Packages will be the organizing principle.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Pilot Languages </SectionTitle> <Paragraph position="0"> In a consortium as large as Polyglot that, moreover, assembles partners from countries with widely diverging cultural and economic status and traditions it is impossible that all partners have the same level of expertise in all aspects of the work. That is reflected by the fact that some of the partners avail of high quality speech recognition and/or speech synthesis systems for their own language, whereas other partners are still in early stages of building such systems for their own language. That is not necessarily due to a lack of knowledge or expertise; it can also be the result of strategic decisions of some partner to concentrate his efforts on other topics in the past. In such a situation it is only natural that the short term goals for the languages are different. This introduces the concept of pilot languages, i.e., languages for which the work is ahead of the remaining languages. The experience gained in the work on the pilot languages is disseminated and used to speed up the work for the other languages.</Paragraph> </Section> </Section> class="xml-element"></Paper>