File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-0108_evalu.xml

Size: 8,332 bytes

Last Modified: 2025-10-06 13:59:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0108">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Cluster-based Language Model for Sentence Retrieval in Chinese Question Answering</Title>
  <Section position="5" start_page="59" end_page="62" type="evalu">
    <SectionTitle>
4 Experiments and Analysis
</SectionTitle>
    <Paragraph position="0"> Research on Chinese question answering, is still at its early stage. And there is no public evaluation platform for Chinese question answering. So in this paper, we use the evaluation environment presented by [Youzheng Wu, et al. 2004] which is similar to TREC question answering track [Ellen. M. Voorhees. 2004]. The documents collection is downloaded from Internet which size is 1.8GB. The testing questions are collected via four different approaches which has 7050 Chinese questions currently.</Paragraph>
    <Paragraph position="1"> In this section, we randomly select 807 testing questions which are fact-based short-answer questions. Moreover, the answers of all testing questions are named entities identified by [Youzheng Wu, et al. 2005]. Figure 2 gives the details. Note that, LOC, ORG, PER, NUM and TIM denote the questions which answer types are location, organization, person, number and time respectively, SUM means all question types.</Paragraph>
    <Paragraph position="2">  question and will be strictly evaluated (unsupported answers counted as wrong) using mean reciprocal rank (MRR).</Paragraph>
    <Section position="1" start_page="59" end_page="60" type="sub_section">
      <SectionTitle>
4.1 Baseline: Standard Language Model for
Sentence Retrieval
</SectionTitle>
      <Paragraph position="0"> Based on the standard language model for information retrieval, we can get the baseline performance, as is shown in Table 4, where a is the  In the following chapter, we conduct experiments to answer two questions.</Paragraph>
      <Paragraph position="1">  1. Whether cluster-based language model for sentence retrieval could improve the performance of standard language model for sentence retrieval? 2. What are the performances of sentence clustering for various question types?</Paragraph>
    </Section>
    <Section position="2" start_page="60" end_page="61" type="sub_section">
      <SectionTitle>
4.2 Cluster-based Language Model for Sen-
tence Retrieval
</SectionTitle>
      <Paragraph position="0"> In this part, we will conduct experiments to validate the performances of cluster-based language models which are based on One-Sentence-Multi-Topics and One-Sentence-One-Topic sentence clustering respectively. In the following experiments, b = 0.9.</Paragraph>
      <Paragraph position="1">  on One-Sentence-Multi-Topics The experimental results of cluster-based language model based on One-Sentence-Multi-Topics sentence clustering are shown in Table 5. The relative improvements are listed in the bracket.</Paragraph>
      <Paragraph position="2">  From the experimental results, we can find that by integrating the clusters/topics of the sentence into language model, we can achieve much improvement at each stage of a . For example, the largest and smallest improvements for all types of questions are about 7.7% and 2.8% respectively. This experiment shows that the proposed cluster-based language model based on One-Sentence-Multi-Topics is effective for sentence retrieval in Chinese question answering.</Paragraph>
      <Paragraph position="3">  on One-Sentence-One-Topic The performance of cluster-based language model based on One-Sentence-One-Topic sentence clustering is shown in Table 6. The relative improvements are listed in the bracket.</Paragraph>
      <Paragraph position="4">  In Comparison with Table 5, we can find that the improvement of cluster-based language model based on One-Sentence-One-Topic is slightly lower than that of cluster-based language model based on One-Sentence-Multi-Topics. The reasons lie in that Clusters based on One-Sentence-One-Topic approach are very coarse and much information is lost. But the improvements over baseline system are obvious. Table 7 shows that MRR1 and MRR20 scores of cluster-based language models for all question types. The relative improvements over the base-line are listed in the bracket. This experiment is to validate whether the conclusion based on different measurements is consistent or not.</Paragraph>
      <Paragraph position="5">  two cluster-based language models are higher than that of the baseline system under different measurements. For MRR1 scores, the largest improvements of cluster-based language models based on One-Sentence-Multi-Topics and One-Sentence-One-Topic are about 15% and 10% respectively. For MRR20, the largest improvements are about 7% and 4% respectively. Conclusion 1: The experiments show that the proposed cluster-based language model can improve the performance of sentence retrieval in Chinese question answering under the various measurements. Moreover, the performance of clustering-based language model based on One-Sentence-Multi-Topics is better than that based on One-Sentence-One-Topic.</Paragraph>
    </Section>
    <Section position="3" start_page="61" end_page="62" type="sub_section">
      <SectionTitle>
4.3 The Analysis of Sentence Clustering for
Various Question Types
</SectionTitle>
      <Paragraph position="0"> The parameter b in equation (3) denotes the balancing factor of the cluster model and the collection model. The larger b , the larger contribution of the cluster model. The small b , the larger contribution of the collection model. If the performance of sentence retrieval decreased with the increasing of b , it means that there are many noises in sentence clustering. Otherwise, sentence clustering is satisfactory for cluster-based language model. So the task of this experiment is to find the performances of sentence clustering for various question types, which is helpful to select the most proper b to obtain the best performance of sentence retrieval.</Paragraph>
      <Paragraph position="1"> With the change of b and the fixed a (a = 0.9), the performances of cluster-based language model based on One-Sentence-Multi-Topics are shown in Figure 3.</Paragraph>
      <Paragraph position="2">  Topics with the Change of b In Figure 3, the performances of TIM and NUM type questions decreased with the increasing of the parameter b (from 0.6 to 0.9), while the performances of LOC, PER and ORG type questions increased. This phenomenon showed that the performance of sentence clustering based on One-Sentence-Multi-Topics for TIM and NUM type questions is not as good as that for LOC, PER and ORG type questions. This is in fact reasonable. The number and time words frequently appeared in the sentence, which does not represent a cluster/topic when they appear. While PER, LOC and ORG entities can represent a topic when they appeared in the sentence.</Paragraph>
      <Paragraph position="3"> Similarly, with the change of b and the fixed a (a =0.9), the performances of cluster-based language model based on One-Sentence-One-Topic are shown in Figure 4.</Paragraph>
      <Paragraph position="4">  Topic with the Change of b In Figure 4, the performances of TIM, NUM, LOC and SUM type questions decreased with the increasing of b (from 0.6 to 0.9). This phenomenon shows that the performances of sentence clustering based on One-Sentence-One-Topic are not satisfactory for most of question types. But, compared to the baseline system, the cluster-based language model based on this kind of sentence clustering can still improve the performances of sentence retrieval in Chinese question answering.</Paragraph>
      <Paragraph position="5"> Conclusion 2: The performance of the proposed sentence clustering based on One-Sentence-Multi-Topics for PER, LOC and ORG type questions is higher than that for TIM and NUM type questions. Thus, for PER, LOC and ORG questions, we should choose the larger b value (about 0.9) in cluster-based language model based on One-Sentence-Multi-Topics.</Paragraph>
      <Paragraph position="6"> While for TIM and NUM type questions, the  value of b should be smaller (about 0.5). But, the performance of sentence clustering based on One-Sentence-One-Topic for all questions is not ideal, so the value for cluster-based language model based on One-Sentence-One-Topic should be smaller (about 0.5) for all questions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML