File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-1047_intro.xml

Size: 3,016 bytes

Last Modified: 2025-10-06 14:05:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1047">
  <Title>LOGIC COMPRESSION OF DICTIONARIES FOR MULTILINGUAL SPELLING CHECKERS</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Since the first work in 1957 by Glantz \[611, a great deal of timer)zing and reseltrch has taken place on the sub-ject of spelling verificatiou and correction. Many commercial products (word processors, desktop presentation,...) inelude efficient spelling checkers on mic,'o-computers. The classical methods, used arc generally based on a morphological analyzer. This is sufficient to provide a robust monolingual spelling checker, but using morphological amdyzers can become unrealistic when wc want to develop an univers~d solution. In fact, tile analyzers built for each language use various linguistic models and engines, and it is impossible to convert a morphoh)gical analyzer from one formalism to another. Furthermore, using flmse classical mcthods would lead to combining into the host application as many of grammars and parsers as languages, which would increase the code size and Ihe mainten:mcc problem of rules and data. The method presented in this paper is based on building a dictionary of all surface forms for each language, which is sufficient for spelling checkers applications. &amp;quot;llle dictionary built with the existing genera)ors can bc e~ily updated manually bt,t may l)e huge, especially for some agglutinative language (Arabic, Turkish,...). A compression process on the muir)lingual dictionaries is neeess,'u'y to obtain a reduced size. The existins compression methods generally used are physical and provide good results for indo-European languages.</Paragraph>
    <Paragraph position="1"> Applying the sane techniques to other languages (Arabic, Tnrkish,...) shows their limits. For this reason we introduce a new kind of compression techniques that we called &amp;quot;logic compression&amp;quot;. This new technique requires a p,'imitire morphological knowledge during tile compression process and requires less storage space than prevkms methods. It ,also has the advantage of being an universal lnelhod applicable to all languages, Seclion 1 contains an overview of existing methods for building spell checkers and the limits of such system whcll we take into account new constraints such as lnnltilingual)sin. Section 2 outlines tile first two steps of our work: we adapt an existing method to Arabic, then make a first extension hy introducing a new kind of compression called &amp;quot;logic compression&amp;quot;. Section 3 introduces ill detail the logic compression with its application to other langtmges, ll,ld shows the improvcinents obtained when using logic compression ill conjunction with existing methods.</Paragraph>
    <Paragraph position="2"> Section 4 outlines the architecture of our lnullilhlgual spelling checker system and some future projects.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML