File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/j00-1006_intro.xml

Size: 3,171 bytes

Last Modified: 2025-10-06 14:00:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="J00-1006">
  <Title>Multitiered Nonlinear Morphology Using Multitape Finite Automata: A Case Study on Syriac and Arabic</Title>
  <Section position="4" start_page="79" end_page="81" type="intro">
    <SectionTitle>
2 Capital-initial strings will be used shortly to denote variables. For this reason, we represent the pattern
</SectionTitle>
    <Paragraph position="0"> using small letters.</Paragraph>
    <Section position="1" start_page="81" end_page="81" type="sub_section">
      <SectionTitle>
Computational Linguistics Volume 26, Number 1
3.1 The Lexicon Component
</SectionTitle>
      <Paragraph position="0"> Here, the lexicon consists of multiple sublexica, each sublexicon containing entries for one particular lexical representation (or tier in the autosegmental analysis). Since an n-tuple contains n- 1 lexical elements (the first element is the surface representation), the lexicon component consists of n - 1 sublexica. A Syriac lexicon for the data in Table 1 requires a pattern sublexicon, a root sublexicon, and a vocalism sublexicon. Other affixes that do not conform to the root-and-pattern nature of Semitic morphology (e.g., the reflexive prefix {?et}) can either be given their own sublexicon or placed in one of the three sublexica. Since pattern segments are the closest--in terms of number-to surface segments, such morphemes are represented in the pattern sublexicon by convention.</Paragraph>
      <Paragraph position="1"> As a way of illustration, the first sublexicon for the Syriac data in Table 1 contains the following entries: {?et} (representing the reflexive prefix) and {cvcvc} (for the verbal pattern). Here, we have chosen to derive all verbs from this pattern in a way reminiscent of McCarthy (1993) rather than entering separate patterns for each morpheme. The second sublexicon maintains roots, e.g., {ktb} 'notion of writing', {pnq} 'notion of delight', and {qrb} 'notion of approaching'. The third sublexicon maintains vocalisms: {ae} for active stems and {a} (with spreading) for passive ones.</Paragraph>
    </Section>
    <Section position="2" start_page="81" end_page="81" type="sub_section">
      <SectionTitle>
3.2 The Rewrite Rules Component
</SectionTitle>
      <Paragraph position="0"> The rewrite rules component maps the multiple lexical representations to a surface representation and vice versa. It also provides for phonological, orthographic, and other rules. The current model adopts the formalism presented by Ruessink (1989) and Pulman and Hepple (1993) with additional extensions to handle multiple lexical forms. Below, the top line represents the lexical tiers and the bottom line represents the corresponding surface form:</Paragraph>
      <Paragraph position="2"> LLC denotes the left lexical context, LEX denotes the lexical form, and RLC denotes the right lexical context. LSC, SURF and RSC are the surface counterparts. The context denoted by &amp;quot;*&amp;quot; represents Kleene star as applied to the grammar alphabet (i.e., matching anything). When all four contexts are &amp;quot;*&amp;quot;, they are omitted from rules; i.e., the formalism becomes:</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML