File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0505_intro.xml

Size: 3,540 bytes

Last Modified: 2025-10-06 14:06:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0505">
  <Title>Inflected Languages. Application to Basque Language</Title>
  <Section position="2" start_page="0" end_page="29" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> So far, word prediction methods have been developed in order to increase message composition rate for people with severe motor and speech disabilities.</Paragraph>
    <Paragraph position="1"> These methods try to guess what is going to be the current or even the next word the user is trying to type. Their results are normally measured in terms of keystroke savings (Ks) 1 .</Paragraph>
    <Paragraph position="2"> To our knowledge, the design of word prediction methods is mainly focused on non-inflected languages, like English. Words in these types of languages have a small amount of variation, like the ones due to number (singular or plural) for instance,</Paragraph>
    <Paragraph position="4"> house/houses, spy~spies. Some other languages admit differences in gender, for example in French: voisin/voisine. When the number of different forms of a word is small, it is possible to include all of them in the dictionary used in word prediction. Nonetheless, inflected languages can have a huge number of affixes that determine the syntactic function of each word and therefore it is not possible to include every variation of a word in the dictionary. So, other methods have to be tried for languages that use extensively prefixes, infixes or suffixes.</Paragraph>
    <Paragraph position="5"> As a starting point, let us show what the declension of a word in Basque may be, by means of an example. The declension of the dictionary entry mendi (which means &amp;quot;mountain&amp;quot;) can be seen in Table 1. This table is valid only for words referring to objects, but there are different tables for declensions of words referring to living beings. Whether the last letter of the lemma is a vowel or a consonant, different tables of declensions are also used. As shown, there are sixty-two possible word-forms for a single dictionary entry. In addition, most of the cases admit the recursive concatenation of suffixes. So, the number of possible cases grows. It has been estimated that nouns may mathematically have even 458,683 inflected forms in Basque language, taking into account two levels .of recursion, (Agirre et al., 1992).</Paragraph>
    <Paragraph position="6"> There also are other suffixes which are not shown in Table 1, as those applied to a verb for subordinate sentences.</Paragraph>
    <Paragraph position="7"> Even if prefixes and infixes are possible the Basque language is declensed mainly by suffixes. There are some prefixes that can be used in some specific cases (for example, a prefix for verbs may indicate the absolutive case in the sentence), but in general their frequency of apparition is not very relevant. The same thing happens with the infixes: there are few of them in Basque and their frequency is not very relevant. Their prediction makes sense mainly if the  word is an auxiliary or a declined verb. For the rest of the cases, it seems better to treat the affix in combination with the lemma as a new lemma, if this combination is usual. Doing this, the complexity of operations decrease because there is only the need to treat lemmas and suffixes.</Paragraph>
    <Paragraph position="8"> Thus, in this paper, the problem of suffixes will mainly be mentioned, because our target language is the Basque language.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML