File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/a92-1015_evalu.xml
Size: 7,923 bytes
Last Modified: 2025-10-06 14:00:07
<?xml version="1.0" standalone="yes"?> <Paper uid="A92-1015"> <Title>Detecting and Correcting Morpho-syntactic Errors in Real Texts</Title> <Section position="7" start_page="115" end_page="117" type="evalu"> <SectionTitle> 6. Results and Evaluation </SectionTitle> <Paragraph position="0"> The system described in this paper has been built as a practical writing aid that operates non-interactively, because the first phase (determining word types, compound analysis, initial spelling correction, and cross-checking corrections for the entire text) takes too long. Nevertheless, it can easily process more than 25 words per second 6 for a large text, which may easily take up half an hour or more.</Paragraph> <Paragraph position="1"> As an example of the performance in the word level checking phase, I presented the system with a 6I have written the system in the programming language C. The results reported below were obtained with the program running on a DECstation 3100. Part of the speed derives from the frequent repetition of many words in large texts.</Paragraph> <Paragraph position="2"> random sample of 1000 lines from two large texts 7.</Paragraph> <Paragraph position="3"> The sample contained nearly 6000 words, with 30 true spelling errors. Of these, 14 were corrected appropriately, and 14 were found but substituted by an incorrect alternative or not corrected at all. Of the 14 appropriately corrected errors, 9 were errors in diacritics only. The system only missed 2 errors, which it assumed to be proper names (both reported at the end of the file (cf. section 5)). It also produced 18 false alarms, 11 of which were caused by very infrequent jargon or inflected word forms missing from the dictionary.</Paragraph> <Paragraph position="4"> Comparison with other spell checkers is hardly possible. For Dutch, only elementary spell checkers based on simple word lookup are available. If this method is applied to the sample text with the same dictionary as used in the full system, the result is entirely different. Such a simple spell checker marks 217 words as misspelled. Among these are not only the 21 true errors and the 9 errors wrongly placed diacritics, but also 37 abbreviations and proper names, and 150 compounds. This amounts to a total of 187 false alarms! The sentence level requires considerably more time. Error-free short sentences can be parsed at a speed of four or more words per second, but long sentences containing one or more errors may require several seconds per word (including correction, which is also rather time consuming). For the texts mentioned in footnote 7 (110,000 words in total), the CPU time required for parsing was approximately 7 hours.</Paragraph> <Paragraph position="5"> But what counts is not only speed; quality is at least equally important. Preliminary tests have shown satisfactory results. A 150 sentence spelling test for secretaries and typists, with an average sentence length between six and seven, was performed within nine minutes (elapsed time) leaving only three errors undetected, correcting the other 72 errors appropriately and producing no false alarms.</Paragraph> <Paragraph position="6"> (Human subjects passed the test if they could complete it within ten minutes making at most ten mistakes.) The three undetected errors involved semantic factors, and were therefore beyond the scope of the system. The rightly corrected errors were typographical and (mainly) orthographical errors, agreement errors and errors in idiomatic expressions. null 7These manuscripts are representative for texts submitted to the system by a publisher who has access to it. A typical example is a text concerning employment legislation and collective wage legislation of over 660,000 characters (a total of 92,000 words) of plain text with mark-up instructions. Checking the words and correcting misspelled words took 16 CPU minutes, which results in a speed of nearly 100 words per CPU second. A smaller text in the same content domain (150,000 characters in 27,500 words) was checked and corrected at word level in 4.5 minutes of CPU time, which is over 100 words per CPU second.</Paragraph> <Paragraph position="7"> Other spelling exercises also showed good results (most errors detected and most corrected properly, very few false alarms, if any). A typical text was chosen from a text book with correction exercises for pupils. In contrast with the spelling test described in the previous paragraph, most sentences in this test contained more than one spelling error. The errors varied from superfluous or missing diaeresis to split compounds and d/t-errors. On a total of 30 sentences, the system discovered 75 errors, of which 62 were corrected properly, 12 miscorrected and one was given no correction at all; it missed 7 errors, while producing one false alarm. Although the total number of words was only half the number of words in the previous test (457 to be precise), the system took almost three times as much time to process it. This was partly due to the greater average sentence length (over 15 words per sentence) and the occurrence of more than one error per sentence (up to four per sentence). The number of errors that could not have been detected without a parser was 18. Of these, 10 were corrected and 1 was detected but substituted by a wrong alternative, while the parser missed the 7 errors mentioned earlier.</Paragraph> <Paragraph position="8"> On large real texts, i.e. not constructed for the purpose of testing one's knowledge of spelling, the system performed less well due to parsing problems.</Paragraph> <Paragraph position="9"> As an example of a well written text, I took the first 1000 lines of a text mentioned in footnote 7. This sample consisted of 7443 words in 468 sentences (an average of nearly 16 words per sentence). At word level it performed quite satisfactorily. It caused 12 false alarms 8, while detecting 11 true errors, of which only 4 were properly corrected. The compound analysis functioned almost flawlessly.</Paragraph> <Paragraph position="10"> However, it caused 6 of the 12 false alarms, because one single word, which was not in the dictionary, appeared in 4 different compounds. The heuristics for suspicious words cooperated very well with the spelling correcter (6 correct guesses, 2 wrong).</Paragraph> <Paragraph position="11"> The parser's performance however degraded considerably. One reason was the great length of many sentences (up to 86 words). This sometimes caused the parser to exceed its built-in time limit, so that it could not give a correct error message 9. Long sentences are also highly ambiguous. This increases the probability of finding a very awkward but error-free parse, thereby overlooking real errors. Another reason for the performance degradation was the abundant use of interjections, names (between quotes, dashes or parentheses) and colloquial (ungrammatical) expressions. Although the parser has some provisions for simply skipping such con8In 4 cases, the false alarm was caused by word contraction. E.g. the word echtgeno(o)t(e), which is supposed to mean echtgenoot of echtgenote (husband or wife), was marked incorrect and substituted by echtgenoot.</Paragraph> <Paragraph position="12"> 9Unfortunately, the program does not keep track of this, so no data can be specified.</Paragraph> <Paragraph position="13"> structions, they more often than not interfere with error detection. Fortunately, subject-verb agreement errors indicating d/t-errors were spotted quite reliably, although their number (two in this sample, which were both corrected) is too small to draw any firm conclusion. The detection of punctuation errors and split compounds still needs improvement.</Paragraph> <Paragraph position="14"> Whether the results justify the 30 minutes CPU time it took to parse the 468 sentences remains to be seen.</Paragraph> </Section> class="xml-element"></Paper>