File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/a94-1008_intro.xml

Size: 4,045 bytes

Last Modified: 2025-10-06 14:05:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="A94-1008">
  <Title>Tagging accurately- Don't guess if you know</Title>
  <Section position="2" start_page="0" end_page="47" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper combines knowledge-based and statistical methods for part-of-speech disambiguation, taking advantage of the best features of both approaches. The resulting output is fully and accurately disambiguated.</Paragraph>
    <Paragraph position="1"> We demonstrate a system that accurately resolves most part-of-speech ambiguities by means of syntactic rules and employs a stochastic tagger to eliminate the remaining ambiguity. The overall results are clearly superior to the reported results for state-of-the-art stochastic systems.</Paragraph>
    <Paragraph position="2"> The input to our part-of-speech disambiguator consists of lexically analysed sentences. Many words have more than one analysis. The task of the disambiguator is to select the contextually appropriate alternative by discarding the improper ones.</Paragraph>
    <Paragraph position="3"> Some of the inappropriate alternatives can be discarded reliably by linguistic rules. For example, we can safely exclude a finite-verb reading if the previous word is an unambiguous determiner. The application of such rules does not always result in a fully disambiguated output (e.g. adjective-noun ambiguities may be left pending) but the amount of ambiguity is reduced with next to no errors. Using a large collection of linguistic rules, a lot of ambiguity can be resolved, though some cases remain unresolved.</Paragraph>
    <Paragraph position="4">  The rule system may also exploit the fact that certain linguistically possible configurations have such a low frequency in certain types of text that they can be ignored. A rule that assumes that a preposition is followed by a noun phrase may be a useful heuristic rule in a practical system, considering that dangling prepositions occur relatively infrequently.</Paragraph>
    <Paragraph position="5"> Such heuristic rules can be applied to resolve some of the ambiguities that survive the more reliable grammar rules.</Paragraph>
    <Paragraph position="6"> A stochastic disambiguator selects the most likely tag for a word by consulting the neighbouring tags or words, typically in a two or three word window.</Paragraph>
    <Paragraph position="7"> Because of the limited size of the window, the choices made by a stochastic disambiguator are often quite naive from the linguistic point of view. For instance, the correct resolution of a preposition vs. subordinating conjunction ambiguity in a small window is often impossible because both morphological categories can have identical local contexts (for instance, both can be followed by a noun phrase). Some of the errors made by a stochastic system can be avoided in a knowledge-based system because the rules can refer to words and tags in the scope of the entire sentence.</Paragraph>
    <Paragraph position="8"> We use both types of disambiguators. The knowledge-based disambiguator does not resolve all ambiguities but the choices it makes are nearly always correct. The statistical disambiguator resolves all ambiguities but its decisions are not very reliable. We combine these two disambiguators; here this means that the text is analysed with both systems.</Paragraph>
    <Paragraph position="9"> Whenever there is a conflict between the systems, we trust the analysis proposed by the knowledge-based system. Whenever the knowledge-based system leaves an ambiguity unresolved, we select that alternative which is closest to the selection made by the statistical system.</Paragraph>
    <Paragraph position="10"> The two systems we use are ENGCG (Karlsson et al., 1994) and the Xerox Tagger (Cutting et al., 1992). We discuss problems caused by the fact that these taggers use different tag sets, and present the results obtained by applying the combined taggers to a previously unseen sample of text.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML