File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1043_metho.xml

Size: 1,653 bytes

Last Modified: 2025-10-06 14:15:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1043">
  <Title>Mixed Language Query Disambiguation</Title>
  <Section position="4" start_page="336" end_page="336" type="metho">
    <SectionTitle>
3 Evaluation experiments
</SectionTitle>
    <Paragraph position="0"> The mutual information between co-occurring words and its contribution weight is ob- null biguating feature tained from a monolingual training corpus--Wall Street Journal from 1987-1992. The training corpus size is about 590MB. We evaluate our methods for mixed language query disambiguation on an automatically generated mixed-language test set. No bilingual corpus, parallel or comparable, is needed for training.</Paragraph>
    <Paragraph position="1"> To evaluate our method, a mixed-language sentence set is generated from the monolingual ATIS corpus. The primary language is English and the secondary language is chosen to be Chinese. Some English words in the original sentences are selected randomly and translated into Chinese words manually to produce the testing data. These axe the mixed language sentences. 500 testing sentences are extracted from the ARPA ATIS corpus. The ratio of Chinese words in the sentences varies from 10% to 65%.</Paragraph>
    <Paragraph position="2"> We carry out three sets of experiments using the three different features we have presented in this paper. In each experiment, the percentage of primary language words in the sentence is incrementally increased at 5% steps, from 35% to 90%. We note the accuracy of unambiguous translation at each step. Note that at the 35% stage, the primary language is in fact Chinese.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML