File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2142_intro.xml
Size: 2,775 bytes
Last Modified: 2025-10-06 14:00:53
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2142"> <Title>Rapid Development of Translation Tools: Application to Persian and Turkish</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> At CRL, one of the major research topics is the development and deployment of machine translation systems for low-density languages in a short amount of time. As the availability of knowledge sources suitable for automatic processing in those languages (e.g. Persian, Turkish, Serbo-Croatian) is usually scarce, the systems developed have to assist the acquisition process in an incremental fashion, starting out fl'om low-level translation on a word-fo>word basis and gradually extending to the incorporation of syntactic and world knowledge.</Paragraph> <Paragraph position="1"> The tasks and requirements for a machine translation enviromnent that supports linguists with the necessary tools to develop and debug increasingly complex knowledge about a specific language inelude: null * The development of a bilingual dictionary that is used for initial basic translation and can further be utilized in the more complex translation system stages.</Paragraph> <Paragraph position="2"> * Methods to describe and process morphologically rich languages, either by integrating already existing processors or by developing a morphologicM processor within the system framework.</Paragraph> <Paragraph position="3"> * Glossing a text to ensure the correctness of morphological analysis and the colnpleteness of the dictionary for a given corpus.</Paragraph> <Paragraph position="4"> * Processors and grammar development tools for the syntactic analysis of the source language.</Paragraph> <Paragraph position="5"> * In order to allow rapid development cycles, the translation system itself has to be reasonably fast and provide the user with a rich environment for debugging of linguistic data and knowledge.</Paragraph> <Paragraph position="6"> * The system used for development must be configurable for a large variety of tasks that emerge during the development process.</Paragraph> <Paragraph position="7"> We have developed a component-based translation system that meets all of the criteria mentioned above. In the next section, we will describe the m:chitecture of the system MEAT (Multilingual Environment for Advanced Trmlslations) , which is used to translate between a number of languages (Persian, Turkish, Arabic, Japanese, Korean, Russian, Serbo-Croatian and Spanish) and English. In the following sections, we will describe the general development cycle for a new language, going into nmre detail for two such languages, Persian and Turkish.</Paragraph> </Section> class="xml-element"></Paper>