File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/92/a92-1016_concl.xml
Size: 2,735 bytes
Last Modified: 2025-10-06 13:56:44
<?xml version="1.0" standalone="yes"?> <Paper uid="A92-1016"> <Title>XUXEN: A Spelling Checker/Corrector for Basque Based on Two-Level Morphology Agirre E., Alegria I., Arregi X.,</Title> <Section position="7" start_page="123" end_page="123" type="concl"> <SectionTitle> 6 Conclusions </SectionTitle> <Paragraph position="0"> The XUXEN analyzer/checker/corrector has been de~ribed as based on the two-level morphological formalism. It deals with Basque, a highly inflected language recently standardized. At the moment a prototype of the system has been implemented in C language. This implementation is a general tool for Basque useful for texts written by any word processing programme.</Paragraph> <Paragraph position="1"> As is well known, in the two-level model morphemes are stored in the sublexicons without alterations, unlike in other systems. From a linguistic standpoint, the clarity and respect for the lexical unit promoted by this way of focusing morphological analysis is of great importance. However, long-distance dependencies between morphemes can not be adequately expressed by means of the continuation class mechanism. An improved continuation-class mechanism to solve this problem is suggested.</Paragraph> <Paragraph position="2"> At present, the lexicon system contains nearly 15,000 items, now the coding of new iemmas in order to reach 50,000 entries is being completed. At this moment finite verb forms (approximately 2,000) are in the lexicon, although they could be seen as analyzable forms. These verb forms have been described by means of their component morphemes taking into account the long-distance dependency problems they present. This have been done using the extension of the continuation-class formalism described in 3.3 which is currently being implemented.</Paragraph> <Paragraph position="3"> With the lemmas and morphemes coded so far, XUXEN is able to recognize approximately three millions different word-forms without at all counting forms produced by genitive recursion. Considering that most of lemmas in the lexicon can take genitive suffixes, our present implementation of the spelling checker would recognize thousands of millions of word-forms.</Paragraph> <Paragraph position="4"> User-lexicons can be interactively enriched with new entries enabling XUXEN to recognize from then on all the possible flexions derived from them.</Paragraph> <Paragraph position="5"> An additional two-level lexicon subsystem is used in our system to store the so-called typical errors. Typical errors are due often to the recent standardization of the language and dialectal uses. This lexicon subsystem is used preferably when suggesting alternatives to the user.</Paragraph> </Section> class="xml-element"></Paper>