File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-1129_evalu.xml

Size: 4,143 bytes

Last Modified: 2025-10-06 13:59:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1129">
  <Title>Exploring Distributional Similarity Based Models for Query Spelling Correction</Title>
  <Section position="7" start_page="1029" end_page="1029" type="evalu">
    <SectionTitle>
4.2 Results
</SectionTitle>
    <Paragraph position="0"> We first investigated the impact of the interpolation parameter l in equation (5) by applying the confusion probability-based error model on training set. For the string edit-based error model probability )|( cqPed , we used a heuristic score computed as the inverse of weighted edit distance, which is similar to the one used by Cucerzan and Brill (2004).</Paragraph>
    <Paragraph position="1"> Figure 1 shows the accuracy metric at different settings of l. The accuracy generally gains improvements before l reaches 0.9. This shows that confusion probability plays a more important role in the combination. As a result, we empirically set l= 0.9 in the following experiments.</Paragraph>
    <Paragraph position="2">  To evaluate whether the distributional similarity can contribute to performance improvements, we conducted the following experiments. For source channel model, we compared the confusion probability-based error model (SC-SimCM) against two baseline error model settings, which are source model only (SC-NoCM) and the heuristic string edit-based error model (SC-EdCM) we just described. Two maximum entropy models were trained with different feature sets. ME-NoSim is the model trained only with baseline features. It serves as the baseline for ME-Full, which is trained with all the features described in 3.4.1. In training ME-Full, cosine distance is used as the similarity measure examined by feature functions.</Paragraph>
    <Paragraph position="3"> In all the experiments we used the standard viterbi algorithm to search for the best output of source channel model. The n-best list for maximum entropy model training and testing is generated based on language model scores of correction candidates, which can be easily obtained by running the forward-viterbi backward-A* algorithm. On a 3.0GHZ Pentium4 personal computer, the system can process 110 queries per second for source channel model and 86 queries per second for maximum entropy model, in which 20 best correction candidates are used.</Paragraph>
    <Paragraph position="4">  experiments, which shows that both of the two distributional similarity-based models boost accuracy over their baseline settings. SC-SimCM achieves 26.3% reduction in error rate over SC-EdCM, which is significant to the 0.001 level (paired t-test). ME-Full outperforms ME-NoSim in all three evaluation measures, with 9.8% reduction in error rate and 16.2% improvement in recall, which is significant to the 0.01 level.</Paragraph>
    <Paragraph position="5"> It is interesting to note that the accuracy of SC-SimCM is slightly better than ME-NoSim, although ME-NoSim makes use of a rich set of features. ME-NoSim tends to keep queries with frequently misspelled terms unchanged (e.g. caffine extractions from soda) to reduce false alarms (e.g. bicycle suggested for biocycle).</Paragraph>
    <Paragraph position="6"> We also investigated the performance of the models discussed above at different recall. Figure 2 and Figure 3 show the precision-recall curves and accuracy-recall curves of different models. We observed that the performance of SC-SimCM and ME-NoSim are very close to each other and ME-Full consistently yields better performance over the entire P-R curve.</Paragraph>
    <Paragraph position="7">  We performed a study on the impact of training size to ensure all models are trained with enough data.</Paragraph>
    <Paragraph position="8">  trained with different number of samples Figure 4 shows the accuracy of the two maximum entropy models as functions of number of training samples. From the results we can see that after the number of training samples reaches 600 there are only subtle changes in accuracy and recall. Therefore basically it can be concluded that 2,000 samples are sufficient to train a maximum entropy model with the current feature sets.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML