XML Viewer - j95-2004

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/95/j95-2004_abstr.xml

Size: 1,460 bytes

Last Modified: 2025-10-06 13:48:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="J95-2004">
  <Title>Deterministic Part-of-Speech Tagging with Finite-State Transducers</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
MERL
Yves Schabes*
MERL
</SectionTitle>
    <Paragraph position="0"> Stochastic approaches to natural language processing have often been preferred to rule-based approaches because of their robustness and their automatic training capabilities. This was the case for part-of-speech tagging until Brill showed how state-of-the-art part-of-speech tagging can be achieved with a rule-based tagger by inferring rules from a training corpus. However, current implementations of the rule-based tagger run more slowly than previous approaches. In this paper, we present a finite-state tagger, inspired by the rule-based tagger, that operates in optimal time in the sense that the time to assign tags to a sentence corresponds to the time required to follow a single path in a deterministic finite-state machine. This result is achieved by encoding the application of the rules found in the tagger as a nondeterministic finite-state transducer and then turning it into a deterministic transducer. The resulting deterministic transducer yields a part-of-speech tagger whose speed is dominated by the access time of mass storage devices. We then generalize the techniques to the class of transformation-based systems.</Paragraph>
  </Section>
class="xml-element"></Paper>

Download Original XML