File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/h93-1037_intro.xml
Size: 3,903 bytes
Last Modified: 2025-10-06 14:05:28
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1037"> <Title>LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM*</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> The DARPA initiative in machine translation supports three very different avenues of research, including CANDIDE's fully automatic system \[1,2\], the interactive, knowledge-based system of the PANGLOSS group \[3-6\], and LINGSTAT, also an interactive system. LING-STAT, as its name implies, incorporates both linguistic and statistical knowledge representations. It is intended for users who are native speakers of the target language, and is designed to be useful to those with little knowledge of the source (by providing access to foreign language documents), as well as those with a greater knowledge of the source (by improving productivity in translation).</Paragraph> <Paragraph position="1"> Although a future implementation will suggest translations of phrases and sentences, high quality automatic translation is not a goal; LINGSTAT's purpose is to relieve users of the most tedious and difficult translation tasks, but may well leave problems that the user is better suited to solve.</Paragraph> <Paragraph position="2"> Initial efforts have been focused on the translation of Japanese to English in the domain of mergers and acquisitions, and a first version of a translator's workstation has been assembled. Work has also begun on a Spanish version of the system. As resources become available, particularly parallel corpora, the Spanish system will be further developed and work will be extended to include other European languages. This paper describes the Japanese system.</Paragraph> <Paragraph position="3"> Japanese poses special challenges in translation that are not seen in European languages. The most striking are *This work was sponsored by the Defense Advanced Research Projects Agency under contract number J-FBI-91-239.</Paragraph> <Paragraph position="4"> that Japanese text is not divided into words, and that the number of writing symbols is very large. These symbols can be divided into at least four sets: kanji, hiragana, katakana, and, occasionally, the Latin alphabet.</Paragraph> <Paragraph position="5"> The general-use kanji number about 2000. They are not phonetic symbols (most have several pronunciations, depending on context), but carry meaning and often appear two or three to a word. Hiragana and katakana, on the other hand, are phonetic alphabets; hiragana is usually used for important function words in Japanese grammar (sentence particles, auxiliary verbs) and to indicate inflection of verbs, adjectives, and nouns, while katakana is used almost exclusively for borrowed foreign words.</Paragraph> <Paragraph position="6"> Another difficulty of Japanese is that it lacks many grammatical features taken for granted in English, such as plurals, articles, routine use of pronouns, and a future tense. Conversely, there are many Japanese concepts that have no analog in English, including the many levels of politeness, the notion of a sentence topic distinct from its subject, and exclusive vs. non-exclusive listings.</Paragraph> <Paragraph position="7"> In addition, Japanese word order and sentence structure are very different from English.</Paragraph> <Paragraph position="8"> This paper is organized as follows. Section 2 lists the dictionaries and text resources used in assembling LINGSTAT. Section 3 presents an outline of the system components, some of which are described in greater detail in section 4. Section 5 describes the results of the DARPA July 1992 evaluation of the Japanese system, as well some informal results on the Spanish system. Section 6 discusses some improvements planned for future versions of the workstation.</Paragraph> </Section> class="xml-element"></Paper>