File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-1069_evalu.xml
Size: 1,078 bytes
Last Modified: 2025-10-06 14:00:31
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1069"> <Title>An IR Approach for Translating New Words from Nonparallel, Comparable Texts</Title> <Section position="7" start_page="418" end_page="418" type="evalu"> <SectionTitle> 10 Discussions </SectionTitle> <Paragraph position="0"> Our algorithm is the first to have generated a collocation bilingual lexicon, albeit small, from a nonparallel, comparable corpus. We have shown that the algorithm has good precision, but the recall is low due to the difficulty in extracting unambiguous Chinese and English words.</Paragraph> <Paragraph position="1"> Better results can be obtained when the following changes are made: * improve seed word lexicon reliability by stemming and POS tagging on both English and Chinese texts; * improve Chinese segmentation by using a larger monolingual Chinese lexicon; * use larger corpus to generate more unknown words and their candidates by statistical methods; We will test the precision and recall of the algorithm on a larger set of unknown words.</Paragraph> </Section> class="xml-element"></Paper>