File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/w00-1314_concl.xml

Size: 1,792 bytes

Last Modified: 2025-10-06 13:52:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1314">
  <Title>Word Alignment of English-Chinese Bilingual Corpus Based on Chunks</Title>
  <Section position="6" start_page="114" end_page="114" type="concl">
    <SectionTitle>
5 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> With the more and more bilingual corpora, there is a tendency in NLP community to process and refine the bilingual corpora, which can serve as the knowledge base in support of many NLP applications. In this paper, a method for the word alignment of English-Chinese corpus based on chunks is presented. After identified the chunks of English sentences, we predict the chunk boundaries of Chinese sentences by the bilingual lexicon, synonymy Chinese dictionary and heuristic information. The ambiguities of Chinese chunk boundaries are resolved by the coterminous words in English chunks. After produce the word candidate sets by statistical method, we calculate the translation relation probability between every word pair and select the best alignment forms. We evaluate our system by real corpus and present the results.</Paragraph>
    <Paragraph position="1"> Although the results we got are quite promising to bilingual English Chinese text, there are still much to do in near future. The corpus we use in our experinaent is a relative small corpus about computer handbook, in which the terms are translated with high consistency. We should extend our method to the large corpus of other domains without lost much accuracy. To increase the correct rate of Chinese word segmentation is important for our word alignment. To extract the corresponding syntax information of English Chinese bilingual corpus by shallow parsing is a direction for future work, also.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML