File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/w97-0119_concl.xml

Size: 2,097 bytes

Last Modified: 2025-10-06 13:57:51

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0119">
  <Title>i i Finding Terminology Translations from Non-parallel Corpora</Title>
  <Section position="7" start_page="199" end_page="201" type="concl">
    <SectionTitle>
9 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have described a statistical word signature feature, the Word Relation Matrix, that can be used to find matching pairs of content words or terms in a pair of same-domain non-parallel bilingual  texts. Evaluation shows a precision of about 30%. We showed that humans are able to translate more than twice as many Japanese technical terms into English when our system output is used, compared to translating a random set of 19 Japanese terms without aid. It is also a significant initial result for lexical translation from truly non-parallel corpora, particularly across language groups. For future work, the quality of seed words can be improved by using a training algorithm to select seed words according to their discriminative power. The dimensionality of WoRM vectors we have chosen is not optimal. A high dimeusionality of vectors is usually favorable (Gale &amp; Church 1994). On the other hand, high dimeusionality can also lead to noise, Therefore, dimensionality reduction methods such as the Singular Value Decomposition (Shiitze 1992) or clustering is often used. In our case, this means that we should choose a large subset of highly discriminative seed word pairs. Additionally, the Word Relation Matrix could be used in combination with other word siguature featur~ for non-parallel corpora.</Paragraph>
    <Paragraph position="1"> In addition to the evaluation results, we have also discovered that the content words in the same segment with a word or term all contribute to the occurrence of this word. This feature represents  some of the long-distance relations between the word and multiple other words which are not its immediate neighbors. The information can be used in language modeling in addition to the currently popular N-gram models and word trigger pairs.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML