File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/h93-1037_concl.xml

Size: 2,755 bytes

Last Modified: 2025-10-06 13:57:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1037">
  <Title>LINGSTAT: AN INTERACTIVE, MACHINE-AIDED TRANSLATION SYSTEM*</Title>
  <Section position="8" start_page="193" end_page="194" type="concl">
    <SectionTitle>
6. CURRENT ANDFUTURE WORK
</SectionTitle>
    <Paragraph position="0"> There are currently two programs underway to improve the translation system. The first is an effort to expand the Japanese and Spanish dictionaries, which requires not only adding words, but also glosses, pronunciations (for Japanese), and multi-word objects. Part of this task involves updating the Japanese and Spanish word frequency statistics, which will improve the performance of the tokenizer in Japanese and the de-inflector in both languages. Part of speech information is also being added, in anticipation of the use of grammatical tools.</Paragraph>
    <Paragraph position="1"> The second program is the development of a probabilistic grammar to parse the source and provide grammatical information to the user. This will supplement or replace the current rule-based finite-state parser currently implemented in the system. In the current phase, we have chosen a lexicalized context-free grammar, which has the property that the probability of choosing a particular production rule in the grammar is dependent on headwords associated with each non-terminal symbol.</Paragraph>
    <Paragraph position="2"> Lexicalization is a useful tool for resolving attachment questions and in sense disambiguation. This grammar will be trained using the inside-outside algorithm \[7\] on Japanese and SpaniSh newspaper articles.</Paragraph>
    <Paragraph position="3"> One use of the grammar will be to provide more accurate glossing of the source by making use of co-occurrence statistics among the phrase headwords. This requires developing an English word list with frequency and part  of speech information, as well as constructing an English inflector-deinflector. These tools, along with an English grammar, will enable the system to construct candidate translations of Japanese phrases and simple Spanish sentences. null A longer term goal of the syntactic analysis (particularly when more languages are incorporated) is to generate a probability distribution in a space of data structures in which the order of representation of the component grammatical elements is language neutral. This can regarded as a kind of syntactic interlingua. There will also be a deeper semantic analysis of the source which will be less dependent on the syntactic analysis, and will use a probabilistic model to fill in the components of a case-frame semantic interlingua. These kinds of structures will allow faster inclusion of new languages and domains.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML