File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1037_intro.xml

Size: 2,789 bytes

Last Modified: 2025-10-06 14:02:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1037">
  <Title>Optimizing disambiguation in Swahili</Title>
  <Section position="3" start_page="4" end_page="7" type="intro">
    <SectionTitle>
CG-2 parser
</SectionTitle>
    <Paragraph position="0"> . In other words, morphological disambiguation and semantic disambiguation were implemented within a single rule system. This was possible because the CG-2 parser treats all strings in the analysis result, including glosses in English, as tags that can be made use of in rule writing (Tapanainen 1996: 6).</Paragraph>
    <Paragraph position="1">  The properties of the CG-2 parser include the following: (a) With a rule one may either select or remove a reading from a cohort  .</Paragraph>
    <Paragraph position="2"> (b) The application of a rule can be constrained  in several ways by making use of the occurrence or absence of features. Reference to the position of the constraining feature can be precisely made forwards and backwards within the sentence. (c) The identification of constraining features can be made relational by more than one phase of scanning, whereby after finding one feature, scanning may be continued again in either direction. By default, scanning terminates at a sentence boundary, but its termination can also be defined elsewhere.</Paragraph>
    <Paragraph position="3"> (d) Rule conditions can be expressed either directly with concrete tags or indirectly by using set names. The latter facility simplifies rule writing, especially of general rules.</Paragraph>
    <Paragraph position="4"> (e) The possibility of concatenating tag sets as well as concrete tags decreases considerably the need of defining tag sets.</Paragraph>
    <Paragraph position="5"> (f) The application of rule order can be defined by placing the rules into sections, so that the more general and reliable rules come first and other rules later in the order of decreasing reliability. This also makes it possible to write heuristic rules within the same rule system.  The environment for writing and testing disambiguation rules was provided by Connexor and Pasi Tapanainen (1996).</Paragraph>
    <Paragraph position="6">  In disambiguation, the precision criterion is considered fulfilled if the reading chosen in that context is correct. In two independent tests with recent news texts of 5,000 words each, the precision was 99.8% and 99.9%.</Paragraph>
    <Paragraph position="7">  A cohort is a word-form plus all its morphological interpretations.</Paragraph>
    <Paragraph position="8"> (g) Mapping rules, which are the standard rules for syntactic mapping, also include a possibility of adding a new reading as well as of replacing the reading of a line. The latter facility is demonstrated below when discussing idioms.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML