XML Viewer - a97-1025

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/a97-1025_evalu.xml
Size: 9,400 bytes
Last Modified: 2025-10-06 14:00:19
<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1025">
  <Title>Contextual Spelling Correction Using Latent Semantic Analysis</Title>
  <Section position="7" start_page="169" end_page="171" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"> The results described in this section are based on the 18 confusion sets selected by Golding (1995; 1996).</Paragraph>
    <Paragraph position="1"> Seven of the 18 confusion sets contain words that are all the same part of speech and the remaining 11 contain words with different parts of speech. Golding and Schabes (1996) have already shown that using a trigram model to predict words from a confusion set based on the expected part of speech is very effective. Consequently, we will focus most of our attention on the seven confusion sets containing words of the same part of speech. These seven sets are listed first in all of our tables and figures. We also show the results for the remaining 11 confusion sets for comparison purposes, but as expected, these aren't as good. We, therefore, consider our system complementary to one (such as Tribayes) that predicts based on part of speech when possible.</Paragraph>
    <Section position="1" start_page="169" end_page="169" type="sub_section">
      <SectionTitle>
5.1 Baseline Prediction System
</SectionTitle>
      <Paragraph position="0"> We describe our results in terms of a baseline prediction system that ignores the context contained in the test sentence and always predicts the confusion word that occurred most frequently in the training corpus.</Paragraph>
      <Paragraph position="1"> Table 1 shows the performance of this baseline predictor. The left half of the table lists the various confusion sets. The next two columns show the training and testing corpus sentence counts for each confusion set. Because the sentences in the Brown corpus are not tagged with a markup language, we identified individual sentences automatically based on a small set of heuristics. Consequently, our sentence counts for the various confusion sets differ slightly from the counts reported in (Golding and Schabes, 1996).</Paragraph>
      <Paragraph position="2"> The right half of Table 1 shows the most frequent word in the training corpus from each confusion set.</Paragraph>
      <Paragraph position="3"> Following the most frequent word is the baseline performance data. Baseline performance is the percentage of correct predictions made by choosing the given (most frequent) word. The percentage of correct predictions also represents the frequency of sentences in the test corpus that contain the given word. The final column lists the training corpus frequency of the given word. The difference between the base-line performance column and the training corpus frequency column gives some indication about how evenly distributed the words are between the two corpora.</Paragraph>
      <Paragraph position="4"> For example, there are 158 training sentences for the confusion set {principal, principle} and 34 test sentences. Since the word principle is listed in the right half of the table, it must. have appeared more frequently in the training set. From the final column,  containing words of the same part of speech and those which have different parts of speech. we can see that it occurred in almost 58% of the training sentences. However, it occurs in only 41% of the test sentences and thus the baseline predictor scores only 41% for this confusion set.</Paragraph>
    </Section>
    <Section position="2" start_page="169" end_page="171" type="sub_section">
      <SectionTitle>
5.2 Latent Semantic Analysis
</SectionTitle>
      <Paragraph position="0"> Table 2 shows the performance of LSA on the contextual spelling correction task. The table provides the baseline performance information for comparison to LSA. In all but the case of {amount, number}, LSA improves upon the baseline performance. The improvement provided by LSA averaged over all confusion sets is about 14% and for the sets with the same part of speech, the average improvement is 16%.</Paragraph>
      <Paragraph position="1"> Table 2 also gives the results obtained by Tribayes as reported in (Golding and Schabes, 1996). The baseline performance given in connection with Tribayes corresponds to the partitioning of the Brown corpus used to test Tribayes. It. should be noted that. we did not implement Tribayes nor did we use the same partitioning of the Brown corpus as Tribayes.</Paragraph>
      <Paragraph position="2"> Thus, the comparison between LSA and Tribayes is an indirect one.</Paragraph>
      <Paragraph position="3"> The differences in the baseline predictor for each system are a result of different partitions of the Brown corpus. Both systems randomly split the data such that roughly 80% is allocated to the training corpus and the remaining 20% is reserved for the test corpus. Due to the random nature of this process, however, the corpora must differ between the two systems. The baseline predictor presented in this paper and in (Golding and Schabes, 1996) are based on the same method so the correspond- null ing columns in Table 2 can be compared to get an idea of the distribution of sentences that contain the most frequent word for each confusion set.</Paragraph>
      <Paragraph position="4"> Examination of Table 2 reveals that it is difficult to make a direct comparison between the results of LSA and Tribayes due to the differences in the partitioning of the Brown corpus. Each system should perform well on the most frequent confusion word in the training data. Thus, the distribution of the most frequent word between the the training and the test corpus will affect the performance of the system. Because the baseline score captures information about the percentage of the test corpus that should be easily predicted (i.e., the portion that contains the most frequent word), we propose a comparison of the results by examination of the respective systems' improvement over the baseline score reported for each. The results of this comparison are charted in Figure 3. The horizontal axis in the figure represents the baseline predictor performance for each system (even though it varies between the two systems). The vertical bar thus represents the performance above (or below) the baseline predictor for each system on each confusion set.</Paragraph>
      <Paragraph position="5"> LSA performs slightly better, on average, than Tribayes for those confusion sets which contain words of the same part of speech. Tribayes clearly out-performs LSA for those words of a different part of speech. Thus, LSA is doing better than the Bayesian component of Tribayes, but it doesn't include part of speech information and is therefore not capable of performing as well as the part of speech trigram component of Tribayes. Consequently, we believe that LSA is a competitive alternative to  a Bayesian classifier for making predictions among words of the same part of speech.</Paragraph>
    </Section>
    <Section position="3" start_page="171" end_page="171" type="sub_section">
      <SectionTitle>
5.3 Performance Tuning
</SectionTitle>
      <Paragraph position="0"> The results that have been presented here are based on uniform treatment for each confusion set. That is, the initial data processing steps and LSA space construction parameters have all been the same. However, the model does not require equivalent treatment of all confusion sets. In theory, we should be able to increase the performance for each confusion set by tuning the various parameters for each confusion set.</Paragraph>
      <Paragraph position="1"> In order to explore this idea further, we selected the confusion set {amount, number} as a testbed for performance tuning to a particular confusion set.</Paragraph>
      <Paragraph position="2"> As previously mentioned, we can tune the number of factors to a particular confusion set. In the case of this confusion set, using 120 factors increases the performance by 6%. However, tuning this parameter alone still leaves the performance short of the baseline predictor.</Paragraph>
      <Paragraph position="3"> A quick examination of the context in which both words appear reveals that a significant percentage (82%) of all training instances contain either the bi-gram of the confusion word preceded by the, followed by of, or in some cases, both. For example, there are many instances of the collocation the+humber+of in the training data. However, there are only one third as many training instances for amount (the less frequent word) as there are for number. This situation leads LSA to believe that the bigrams the+amount and amount+of have more discrimination power than the corresponding bigrams which contain number. As a result, LSA gives them a higher weight and LSA almost always predicts amount when the confusion word in the test sentence appears in this context. This local context is a poor predictor of the confusion word and its presence tends to dominate the decision made by LSA.</Paragraph>
      <Paragraph position="4"> By eliminating the words the and of from the training and testing process, we permit the remaining context to be used for prediction. The elimination of the poor local context combined with the larger number of factors increases the performance of LSA to 13% above the baseline predictor (compared to 11% for Tribayes). This is a net increase in performance of 32%!</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML