File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/92/p92-1021_concl.xml
Size: 1,664 bytes
Last Modified: 2025-10-06 13:56:57
<?xml version="1.0" standalone="yes"?> <Paper uid="P92-1021"> <Title>LATTICE-BASED WORD IDENTIFICATION IN CLARE</Title> <Section position="8" start_page="162" end_page="165" type="concl"> <SectionTitle> 6 CONCLUSIONS </SectionTitle> <Paragraph position="0"> These experimental results suggest that general syntactic and semantic information is an effective source of constraint for correcting typing errors, and that a conceptually fairly simple staged architecture, where word identity and word boundary ambiguities are only resolved when the relevant knowledge is ready to be applied, can be acceptably efficient. The lattice representation also allows the system to deal cleanly with word boundary uncertainty not caused by noise in the input.</Paragraph> <Paragraph position="1"> A fairly small vocabulary was used in the experiment. However, these words were originally selected on the basis of frequency of occurrence, so that expanding the lexicon would involve introducing proportionately fewer short words than longer ones. Mistyped short words tend to be the ones with many correction candidates, so the complexity of the problem should grow less fast than might be expected with vocabulary size. Furthermore, more use could be made of statistical information: relative frequency of occurrence could be used as a criterion for pruning relatively unlikely correction candidates, as could more sophisticated statistics in the suggestion algorithm, along the lines of Kernighan et al (1990). Phonological knowledge, to allow competence errors to be tackled more directly, would provide another useful source of constraint.</Paragraph> </Section> class="xml-element"></Paper>