File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/e95-1029_metho.xml

Size: 11,344 bytes

Last Modified: 2025-10-06 14:14:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="E95-1029">
  <Title>Specifying a shallow grammatical representation for parsing purposes</Title>
  <Section position="4" start_page="210" end_page="211" type="metho">
    <SectionTitle>
2 Grammatical representation in
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="210" end_page="210" type="sub_section">
      <SectionTitle>
English Constraint Grammar
</SectionTitle>
      <Paragraph position="0"> In the experiment to be reported in Section 3, we employed the grammatical representation that defines the descriptive task of the English Constraint Grammar Parser ENGCG (Karlsson et al. (eds.) 1995). 1</Paragraph>
    </Section>
    <Section position="2" start_page="210" end_page="210" type="sub_section">
      <SectionTitle>
2.1 Morphology
</SectionTitle>
      <Paragraph position="0"> The morpholexical component in ENGCG employs 139 morphological tags for part of speech, inflection, derivation and certain syntactic properties (e.g. verb classification). Each morphological analysis usually consists of several tags, and many words get several analyses as alternatives.</Paragraph>
      <Paragraph position="1"> The following analysis of the sentence That round table might collapse is a rather extreme example:  aA list of the ENGCG tags can be retrieved via e-mail by sending an empty mail message to engcginfo@ling.helsinki.fi. The returned document will also tell how to analyse own samples using the ENGCG server.</Paragraph>
      <Paragraph position="2">  &amp;quot;table&amp;quot; &lt;SVO&gt; V IMP VFIN &amp;quot;table&amp;quot; &lt;SV0&gt; V INF &amp;quot;table&amp;quot; &lt;SVO&gt; V PRES -SG3 VFIN &amp;quot;&lt;might&gt;&amp;quot; &amp;quot;might&amp;quot; N NOM SG &amp;quot;might&amp;quot; V AUXMOD VFIN &amp;quot;&lt;collapse&gt;&amp;quot; &amp;quot;collapse&amp;quot; N NOM SG &amp;quot;collapse&amp;quot; &lt;SVO&gt; V SUBJUNCTIVE VFIN &amp;quot;collapse&amp;quot; &lt;SVO&gt; V IMP VFIN &amp;quot;collapse&amp;quot; &lt;SVO&gt; V INF &amp;quot;collapse&amp;quot; &lt;SVO&gt; V PRES -SG3 VFIN ,,&lt;$.&gt;,,  The morphological analyser produces about 180 different tag combinations. To compare the ENGCG morphological description with another well-known tag set, the Brown Corpus tag set: ENGCG is more distinctive in that the part of speech distinction is spelled out in the description of determiner-pronoun, preposition-conjunction, and determiner-adverb-pronoun homographs, as well as uninflected verb forms, which are represented as ambiguous due to the subjunctive, imperative, infinitive and present tense readings. On the other hand, ENGCG does not spell out part-of-speech ambiguity in the description of -ing and nonfinite -ed forms, noun-adjective homographs when the core meanings of the adjective and noun readings are similar, nor abbreviations vs. proper vs. common nouns. Generally, the ENGCG morphological tag set avoids the introduction of structurally unjustified distinctions.</Paragraph>
    </Section>
    <Section position="3" start_page="210" end_page="211" type="sub_section">
      <SectionTitle>
2.2 Syntax
</SectionTitle>
      <Paragraph position="0"> ENGCG syntax employs 30 dependency-oriented functional tags that indicate the surface-syntactic roles of nominal heads (subject, object, preposition complement, apposition, etc.) and modifiers (premodifiers, postmodifiers). The shallow structure of verb chains is also given - the tag set distinguishes between auxiliaries and main verbs, finite and nonfinite. Also the structure of adverbials as well as prepositional and adjective phrases is given, though some of the attachments of adverbials is left underspecified.</Paragraph>
      <Paragraph position="1"> Finally, a disambiguated sample analysis of the above sample sentence:  Syntactic tags are flanked with the @-sign; 2 morphological tags and the base form are given to the left of the syntactic tags.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="211" end_page="212" type="metho">
    <SectionTitle>
3 The experiment
</SectionTitle>
    <Paragraph position="0"> This section reports on an experiment on part-of-speech and syntactic disambiguation by human experts (the authors of this article). Three 2,000word texts were successively used: a software manual, a scientific magazine, and a newspaper.</Paragraph>
    <Section position="1" start_page="211" end_page="211" type="sub_section">
      <SectionTitle>
3.1 Setting
</SectionTitle>
      <Paragraph position="0"> The experiment was conducted as follows.</Paragraph>
      <Paragraph position="1"> 1. The text was morphologically analysed using the ENGCG morphological analyser. For the analysis of unrecognised words, we used a rule-based heuristic component that assigns morphological analyses, one or more, to each word not represented in the lexicon of the system. null 2. Two experts in the ENGCG grammaticalrepresentation independently marked the correct alternative analyses in the .ambiguous input, using mainly structural, but in some structurally unresolvable cases also higher-level, information. The corpora consisted of continuous text rather than isolated sentences; this made the use of textual knowledge possible in the selection of the correct alternative. In the rare cases where two analyses were regarded as equally legitimate, both could be marked. The judges were encouraged to consult the documentation of the grammatical  representation.</Paragraph>
      <Paragraph position="2"> 3. These tagged versions were compared to each other using the Unix sdiff program.</Paragraph>
      <Paragraph position="3"> 4. The differences were jointly examined by the judges in order to see whether they were due to (i) inattention, (ii) incomplete specification of the grammatical representation or (iii) an undecidable analysis.</Paragraph>
      <Paragraph position="4"> 5. A 'consensus' version of the tagged corpus  was prepared. Usually only a unique analysis was given. However, there were three situations where a multiple analysis was accepted: * When the judges disagree about the correct analysis even after negotiations. In this case, comments were added to dis- null tinguish it from the other two types.</Paragraph>
      <Paragraph position="5"> * Neutralisation: both analyses were regarded as equivalent. (This often indicates a redundancy in the lexicon.) 2,,@DN&gt;, represents determiners; &amp;quot;@AN&gt;&amp;quot; represents premodifying adjectives; &amp;quot;@SUB J&amp;quot; represents subjects; &amp;quot;@+FAUXV&amp;quot; represents finite auxiliaries; and &amp;quot;@-FMAINV&amp;quot; represents nonfinite m~in verbs. * Global ambiguity: the sentence was agreed to be globally ambiguous.</Paragraph>
      <Paragraph position="6"> 6. Whenever an undefined construction was detected during the joint examination, the grammar definition manual was updated.</Paragraph>
      <Paragraph position="7"> 7. The preparation of the syntactic version was the next main step. For each contextually appropriate morphological reading, all syn- null tactic tags were introduced with a mapping program. An example: 3  biguities.</Paragraph>
      <Paragraph position="8"> This procedure was successively applied to the three texts to see how much previous updates of the grammar definition manual decreased the need for further updates and how much the interjudge agreement might increase even after the first mechanical comparison (cf. Step 3).</Paragraph>
    </Section>
    <Section position="2" start_page="211" end_page="212" type="sub_section">
      <SectionTitle>
3.2 Results
</SectionTitle>
      <Paragraph position="0"> The results are given in Figure 1 (next page).</Paragraph>
      <Paragraph position="1"> Some comments are in order, first about morphology. null  * The initial consistency rate was constantly above 99%.</Paragraph>
      <Paragraph position="2"> * After negotiations, the judges agreed about the correct analysis or analyses in all cases.  The vast majority of the initial differences were due to inattention, and the remaining few to incomplete specification of the morphological representation. Some representative examples about these jointly examined  3,,@NPHR, represents stray nominal heads; &amp;quot;@OBJ&amp;quot; represents objects; &amp;quot;@I-OBJ&amp;quot; represents indirect objects; &amp;quot;@PCOMPL-S&amp;quot; represents sub-ject complements; &amp;quot;@PCOMPL-O&amp;quot; represents object complements; &amp;quot;@APP&amp;quot; represents appositions; &amp;quot;@NN&gt;&amp;quot; represents premodifying nouns (and non null final noun parts in compounds); &amp;quot;@&lt;P&amp;quot; represents nominal preposition complements; &amp;quot;@O-ADVL&amp;quot; represents nominal adverbials; &amp;quot;@&lt;P-FMAINV&amp;quot; represents nonfinite m~in verbs as preposition complements; and &amp;quot;@&lt;NOM-FMAINV&amp;quot; represents post-modifying nonfinite main verbs.</Paragraph>
      <Paragraph position="3">  differences are in order. (Words followed by an expression of the form (X/Y) were initially tagged differently by the judges. After joint examination, Y was agreed to be the  correct alternative in all cases but (5), where X and Y were regarded as equally possible.) 4 1. As we go(V INF / Y PRES) to(INF null apparent intention to follow suit, are grievous blows..</Paragraph>
      <Paragraph position="4"> 2... they were circulating a letter expressing concern that(PRON REL / CS) it would give the developing countries a blank cheque to demand money from donors to finance sustainable develop- null ment.</Paragraph>
      <Paragraph position="5"> 3. That(PRON DEM / CS) there was no outburst of protest over the new policy suggests that public anxiety over genetic engineering has ebbed in recent years.</Paragraph>
      <Paragraph position="6"> 4. The value-added information is the kind(A /N) we want ourselves.&amp;quot; 5... they had not seen before at one(NUM / PRON) of the busiest times of the school year.</Paragraph>
      <Paragraph position="7"> 6. I don't think people get(V INF / V PRES) a great deal from bald figures.</Paragraph>
      <Paragraph position="8"> 7. She had to ask because some of the  six-year-olds from other schools who attend(V INF / V PRES) her classes know the names of as(PREP/AD-A&gt;) many hard drugs as she does.</Paragraph>
      <Paragraph position="9"> * Only three ulSdates were needed to the morphological part of the manual.</Paragraph>
      <Paragraph position="10"> 4Before an &amp;quot;of&amp;quot; phrase, the pronoun/numeral distinction of &amp;quot;one&amp;quot; was regarded as neutralised. This observation was also added to the morphology manual. * Though multiple analyses were considered acceptable in the case of (even semantically) undecidable situations, very few were actually needed: only 3 words out of 6,071 received two analyses (for example, it was agreed that more could be analysed both as an adverb and as a pronoun in .. free trade will mean you destroy more.).</Paragraph>
      <Paragraph position="11"> Next, some observations about syntax.</Paragraph>
      <Paragraph position="12"> * At the level of syntax, most of the initial differences were identified as obvious mistakes, e.g.: - He was (@+PMAINV/@/FA UXV) addressing his hosts ..</Paragraph>
      <Paragraph position="13"> * Sometimes, however, there was a need to discuss the descriptive policies. Consider the following sentence fragment: 5 - that managers'(@GN&gt;) keeping</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML