File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/a00-3006_concl.xml
Size: 2,776 bytes
Last Modified: 2025-10-06 13:52:38
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-3006"> <Title>The use of error tags in ARTFL's Encyclopgdie: Does good error identification lead to good error correction?</Title> <Section position="4" start_page="31" end_page="32" type="concl"> <SectionTitle> 2 Conclusion </SectionTitle> <Paragraph position="0"> In sum, the errors which are marked with the <?> tag in the electronic version of the 2I admit that these numbers may seem low, but bear in mind that the percentage reflects the accuracy of the first guess made by the system, since its operation is required to be entirely automatic. Furthermore, the correction task is made more difficult by the fact that the corpus is an encyclopedia, which contains more infrequent words and proper names than most corpora.</Paragraph> <Paragraph position="1"> Encyclopddie encompass so many distinct error types, and errors of such difficulty, that it is hard to come up with corrections for many of them without human intervention.</Paragraph> <Paragraph position="2"> For this reason, experience with the Encyclopddie project suggests that error tagging is not necessarily a great aid in performing automatic error correction.</Paragraph> <Paragraph position="3"> There is certainly a great deal of room for further investigation into the use of meta-data in spelling correction in general, however. While the error tag is a somewhat unique member of the tagset, in that it typically flags a subpart of a word, rather than a string of words, this should not be taken to mean that it is the only tag which could be employed in spelling correction. If nothing else, &quot;wider-scope&quot; markup tags can be helpful in determining when certain parts of the corpus should not be seen as representative of the language model, or should be seen as representative of a distinct language model. (For example, the italic tag <+->. often marks Latin text in the Encyclopddie.) Ultimately, I believe that what is needed in order for text tagging to be useful in error correction is a recognition that the tagset will influence the correction process. Tags which are applied in such a way as to delimit sections of text which are relevant to correction (such as names, equations, and foreign language text), will be of greater use than tags which represent a mixture of such classes. Error tagging in particular should be most useful if it does not conflate quite distinct things that may be &quot;wrong&quot; with a text, such as illegibility of the original, unrenderable symbols, and OCR inaccuracies. Such considerations are certainly relevant in the evaluation of emerging text encoding standards, such as the specification of the Text Encoding Initiative.</Paragraph> </Section> class="xml-element"></Paper>