File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/c92-1014_intro.xml

Size: 6,151 bytes

Last Modified: 2025-10-06 14:05:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-1014">
  <Title>A High-level Morphological Description Language Exploiting Inflectional Paradigms</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Pedagogical grmnmar Nalks typically organize their descriptinns of the inflectiomd morphology of a langtmge in terms of paradigms, groups of rnlas which characterize the inflectional behavior of some subset of the language's vocabulary. A French grannnar may divide verbs into the first, secoud, and third conjugations; German grammars speak of &amp;quot;weak&amp;quot; and &amp;quot;strong&amp;quot; verbs; Spanish grammars classify verbs by their infiuitival endings, etc. The family nf word forms that each v(x:abuhu'y item may have can thus he describexl by a combination of a ba~ stem (such as the &amp;quot;citation lbrm&amp;quot; used to index words in a dictionary) and the paradigm the word belongs to. Irregular words, which exhibit belu~viors not completely captured by general paradigms, often tend to be partially describable by reference to regular parudigmatic patterns.</Paragraph>
    <Paragraph position="1"> The word formation rules that comprise a paradigm are usually expressed in terms of a sequence of stem change and affixation operations. For example, one French text-Imok \[NEBEL741, in describing first conjugation verbs, shows how to fi)rm present tense forms nsing the inlinitival stem with its &amp;quot;er&amp;quot; suffix rmnoved. Future tense is tormed by appending ',fffixes to the fifll infinitival stem, while the stem of the imperfect tense is Ionnd by taking the first person plural of the present tense and dropping the &amp;quot;ons&amp;quot;. Ill addition to such word formation roles, there are spelling change rules wbich describe variations ill spelling, often ctmditioned by file phonologic~d or orthographic context th which a word lbrnlation rule is applied.</Paragraph>
    <Paragraph position="2"> While the above characterization of morphological behavior is a huniliar oue, inost description languages that have been developed for cumputatioual morphology (e.g., I KC)-SKENNIEMI841, \[G(}RZ881 ) have tended to locus more on the orthographic and of fixation rules, and pay less attention to explicitly captaring the regularities within and between parudignts. Recently, some researchers have begun exploring the advantages to be derived from a nora.</Paragraph>
    <Paragraph position="3"> tion in which paradigms play a more central role (e.g., \[CALDER891. IRUSSELL911). This paper presents such a notation, called PDL (for Paradigm Description l.auguage), which we are using as the basis of the morphological an~dyzer for A1-STARS, a multi-lingual &amp;quot;lexiconassisted&amp;quot; informatiml retrieval system (\[ANICK901). It has been a goal of our high-level language design to pre~rve, as umch as possible, the kinds of descriptive devices traditiorually used in grammar books.</Paragraph>
    <Paragraph position="4"> Our approach to the representation of pmacfigms borrows from the Artificial Intelligence cmnmunity's notion of &amp;quot;frames&amp;quot;, data structures made up (ff slots with attached procedares, orgmlized hierarchically to snpport default slot inheritance and overtkles (e.g., \[BOBROW771). In a paradigm's &amp;quot;frume&amp;quot;, the slots correspond to surlace anti stem li)nns, whose values are either explicitly stoxvd (in the lexicon) or else computed by word formation rules.</Paragraph>
    <Paragraph position="5"> The hierarchical organization of paradigms helps to captore the sharexl linguistic behaviors among classes of words in all explicit and concise mlnnler.</Paragraph>
    <Paragraph position="6"> Our ;qlplicatiou domain introdnces several constraints on the design of its morphological component: - The morphological recognizer must work with a dynamic secondary storage lexicnn access~xl via an index on stem tornls. 'Ibis constratht rnles out approaches relying on a left to right scan of file wool using special in-mmnory letter Iree eucodings of the dictionary (e.g., \[GORZ881). It requires an approach Acri?s DE COLING-92, NANTES, 23-28 AO~n&amp;quot; 1992 6 7 Proc. of COLING-92, NANTES, AUG. 23-28, 1992 in which potential stems are derived by affix rcmoral/addition and/or stem chat,ges and then probed for in the lexicon.</Paragraph>
    <Paragraph position="7"> * The morphoh)gical information must additionally support surface form genemtiun and &amp;quot;guessing&amp;quot;. The guesser, to be employed in computer-assisted lexicon acquisition, mast he able to construct potenti:tl citation forms (e.g., infinitive lorms lot verbs), not just stripped stems.</Paragraph>
    <Paragraph position="8"> * The high-level language (PDL) mast be compilable into a lonn suitable for efficient run-time perfonnancc. 'Illis implies not only efficient in-memory data structures but also a system which minimizes disk (lexicon) accesses.</Paragraph>
    <Paragraph position="9"> Our aim is to develop morphological rcpmsenlations tbr a number of (primarily European) hmguages. We have built t~firly complete representations h)r English, French, and Gennan, and have begun invcsfigating Spanish.</Paragraph>
    <Paragraph position="10"> While it is premature to predict how well our approach will apply across the range of European langnages, we have fimnd it contains a nnmher of desirable aspects for applications such as AI-STARS.</Paragraph>
    <Paragraph position="11"> in the next section, we provide an overview of the PDL hmguage, describing how word fonnation rules are organized into a hierarchy of paradigms and how the lexicon and morphological rules interact. Then we provide an illustration of the use of paradigm inheritance to construct a concise encoding of French verb forms. Next we present algorithms for the compilation of PDL rote efficient run-time data structures, and lot the recognition and generation of word fi)rms. We conclude with an evaluation of the strengths anti weaknesses of rite approach, and areas for future research.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML