File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/a94-1008_evalu.xml

Size: 6,537 bytes

Last Modified: 2025-10-06 14:00:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="A94-1008">
  <Title>Tagging accurately- Don't guess if you know</Title>
  <Section position="6" start_page="49" end_page="50" type="evalu">
    <SectionTitle>
5 Performance test
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="49" end_page="49" type="sub_section">
      <SectionTitle>
5.1 Test data
</SectionTitle>
      <Paragraph position="0"> The system was tested against 26,711 words of newspaper text from The Wall Street Journal, The Economist and Today, all taken from the 200-million word Bank of English corpus by the COBUILD team at the University of Birmingham, England (see also (J/irvinen, 1994)). None of these texts have been 2In some cases a word may still remain ambiguous.</Paragraph>
      <Paragraph position="1"> used in the development of the system or the description, i.e. no training effects are to be expected.</Paragraph>
    </Section>
    <Section position="2" start_page="49" end_page="49" type="sub_section">
      <SectionTitle>
5.2 Creation of benchmark corpus
</SectionTitle>
      <Paragraph position="0"> Before the test, a benchmark version of the test corpus was created. The texts were first analysed using the preprocessor, the morphological analyser, and the module for morphological heuristics. This ambiguous data was then manually disambiguated by judges, each having a thorough understanding of the ENGCG grammatical representation. The corpus was independently disambiguated by two judges.</Paragraph>
      <Paragraph position="1"> In the instructions to the experts, special emphasis was given to the quality of the work (there was no time pressure). The two disambiguated versions of the corpus were compared using the Unix sdiff program. At this stage, slightly above 99 % of all analyses agreed. The differences were jointly examined by the judges to see whether they were caused by inattention or by a genuine difference of opinion that could not be resolved by consulting the documentation that outlines the principles adopted for this grammatical representation (for the most part documented in (Karlsson et al., 1994)). It turned out that almost all of these differences were due to inattention. Only in the analysis of a few words it was agreed that a multiple choice was appropriate because of different meaning-level interpretations of the utterance (these were actually headings where some of the grammatical information was omitted).</Paragraph>
      <Paragraph position="2"> Overall, these results agree with our previous experiences (Karlsson et al., 1994): if the analysis is done by experts in the adopted grammatical representation, with emphasis on the quality of the work, a consensus of virtually 100 % is possible, at least at the level of morphological analysis (for a less optimistic view, see (Church, 1992)).</Paragraph>
    </Section>
    <Section position="3" start_page="49" end_page="49" type="sub_section">
      <SectionTitle>
5.3 Morphological analysis
</SectionTitle>
      <Paragraph position="0"> The preprocessed text was submitted to the ENGTWOL morphological analyser, which assigns to 25,831 words of the total 26,711 (96.7 %) at least one morphological analysis. The remaining 880 word-form tokens were analysed with the rule-based heuristic module. After the combined effect of these modules, there were 47,269 morphological analyses, i.e. 1.77 morphological analyses for each word on an average. At this stage, 23 words missed a contextually appropriate analysis, i.e. the error rate of the system after morphological analysis was about 0.1%.</Paragraph>
    </Section>
    <Section position="4" start_page="49" end_page="50" type="sub_section">
      <SectionTitle>
5.4 Morphological disambiguation
</SectionTitle>
      <Paragraph position="0"> The morphologically analysed text was submitted to five disambiguators (see Figure 3). The first one, D1, is the grammar-based ENGCG disambiguator.</Paragraph>
      <Paragraph position="1"> In the next step (D2) we have used also heuristic ENGCG constraints. The probabilistic information  is used in D3, where the ambiguities of D2 are resolved by XT. We also tested the usefulness of the heuristic component of ENGCG by omitting it in D4. The last test, D5, is XT alone, i.e. only probabilistic techniques are used here for resolving ENGTWOL ambiguities.</Paragraph>
      <Paragraph position="2"> The ENGCG disambiguator performed somewhat less well than usually. With heuristic constraints, the error rate was as high as 0.63 %, with 1.04 morphological readings per word on an average. However, most (57 %) of the total errors were made after ENGCG analysis (i.e. in the analysis of no more than 3.6 % of all words). In a way, this is not very surprising because ENGCG is supposed to tackle all the 'easy' cases and leave the structurally hardest cases pending. But it is quite revealing that as much as three fourths of the probabilistic tagger's errors occur in the analysis of the structurally 'easy' cases; obviously, many of the probabilistic system's decisions are structurally somewhat naive. Overall, the hybrid (D3#) reached an accuracy of about 98.5 % significantly better than the 95-97 % accuracy which state-of-the-art probabilistic taggers reach alone.</Paragraph>
      <Paragraph position="3"> The hybrid D3~ is like hybrid D3~, but we have used careful mapping. There some problematic ambiguity (see Figure 2) is left pending. For instance, ambiguities between preposition and infinitive marker (word to), or between subordinator and preposition (word as), are resolved as far as ENGCG disambiguates them, the prediction of XT is not consulted. Also, when XT proposes tags like JJ (adjective), AP (post-determiner) or VB (verb base-form) very little further disambiguation is done. This hybrid does not contain any mapping errors, and on the other hand, not all the XT errors either.</Paragraph>
      <Paragraph position="4"> The test without the heuristic component of ENGCG (D4) suggests that ambiguity should be resolved as far as possible with rules. An open question is, how far we can go using only linguistic information (e.g. by writing more heuristic constraints to be applied after the more reliable ones, in this way avoiding many linguistically naive errors).</Paragraph>
      <Paragraph position="5"> The last test gives further evidence for the usefulness of a carefully designed linguistic rule component. Without such a rule component, the decrease in accuracy is quite dramatic although a part of the errors come from the mapping between tag sets 3.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML