File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/a92-1016_intro.xml

Size: 3,749 bytes

Last Modified: 2025-10-06 14:05:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="A92-1016">
  <Title>XUXEN: A Spelling Checker/Corrector for Basque Based on Two-Level Morphology Agirre E., Alegria I., Arregi X.,</Title>
  <Section position="2" start_page="0" end_page="119" type="intro">
    <SectionTitle>
1 Inlroduclion
</SectionTitle>
    <Paragraph position="0"> This paper describes the application of two-level morphology to Basque, along with its use in the elaboration of the XUXEN spelling checker/corrector. The morphological analyzer included in XUXEN has been designed with the aim of laying the foundations for further development of automatic processing of Basque. The fact that Basque is a highly inflected language makes the correction of spelling errors extremely difficult because collecting all the possible word-forms in a lexicon is an endless task.</Paragraph>
    <Paragraph position="1"> The simplicity of English inflections made for reduced interest in research on morphological analysis by computer. In English, the most common practice is to use a lexicon of all of the inflected forms or a minimum set of morphological rules (Winograd, 83). That means that while a great many language independent tools have been developed for syntactic and semantic analysis, the same cannot be said for morphological tools. In 1981, Kaplan and Kay (Kaplan et al., 81) made a valuable contribution in designing a formalism for phonological generation by means of rules compiled in an automaton. This idea would later be followed up by Koskenniemi (Koskenniemi, 83-85; Karttunen et al., 87) in the two-level formalism. The computational model for two-level morphology has found widespread acceptance in the following years due mostly to its general applicability, declarativeness of rules and clear separation of linguistic knowledge from the program. The essential difference from generative phonology is that there are no intermediate states between lexical and surface representations. Word recognition is reduced to finding valid lexical representations which correspond to a given surface form. Inversely, generation proceeds from a known lexical representation and searches for surface representations corresponding to it. The complexity of the model is studied in depth in (Barton, 85), who with few exceptions agrees with Karttunen (Karttunen, 83) in feeling that thc complexity of a language has no significant effects on the speed of analysis or synthesis.</Paragraph>
    <Paragraph position="2"> There have been many implementations of the two-level model for very different languages, some of them taking a full coverage of the language: Finnish, English and Arabic among others. Our implementation is intended to cope extensively with present day Basque.</Paragraph>
    <Paragraph position="3"> XUXEN manages user-lexicons which can be interactively enriched during correction by means of a specially designed human-machine dialogue which allows the system to acquire the internal features of each new entry (sublexicon, continuation class, and selection marks).</Paragraph>
    <Paragraph position="4"> Moreover, XUXEN deals with errors often due to recent standardization of Basque. An additional lexicon includes alternative variants to the standard entries and additional rules  model erroneous morphophonological changes; this allows a specialized treatment of &amp;quot;typical errors&amp;quot;.</Paragraph>
    <Paragraph position="5"> Following are given an overview of Basque morphology and the application of the two-level model to Basque, then the lexical database built as a support for this and other applications is described, and finally, the strategies followed in the design and implementation of the spelling checkercorrector. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML