File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/90/c90-3072_intro.xml

Size: 2,086 bytes

Last Modified: 2025-10-06 14:04:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-3072">
  <Title>Spelling-checking for Highly Inflective Languages</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> After some delay, personal computers are now widely available in countries speaking Slavonic languages. Of course, they are used, i.a., for text processing. After solving the obvious problems with national alphabets (most of which are unfortunately not included in the standard IBM character set), the demand for a spelling-checker followed. The problem with Slavonic languages in general and with Czech in particular is that they consist of millions of word forms, thus the space needed for storing all of them directly grows over an acceptable boundary (whereas a typical Czech noun without &amp;quot;direct&amp;quot; derivatives has 7 different forms, an adjective could have 80 forms and a verb, which typically forms a dozen of derivatives - multiplied by ten or so possible prefixes - more than 5000). Then, two methods are available to overcome this problem: 1) to compress the forms somehow, still allowing fast access; 2) to use linguistic knowledge about the regularities of the morphological behaviour of the words.</Paragraph>
    <Paragraph position="1"> The first method fails after some investigations, even when considering some probabilistic models (which, using the multiple bit hash tables method (Fiala, 1986) with probability of false answers below 0.0005, cannot use less than 2 bits per word form stored).</Paragraph>
    <Paragraph position="2"> Using the knowledge collected over generations of Czech linguists (e.g, Havr~nek and Jedli~ka, 1963; Slavi~kov~, 1975) and especially the latest works of the Prague group led by prof. P. Sgall (Panevovfi et al., 1981; Weisheitelovh, Krhrakovh and Sgall, 1982; Kirschner, 1983) we adapted the second method for the purpose of a spelling checking program to meet the competing requirements on space, speed and completeness. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML