File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/95/j95-2004_abstr.xml
Size: 1,460 bytes
Last Modified: 2025-10-06 13:48:23
<?xml version="1.0" standalone="yes"?> <Paper uid="J95-2004"> <Title>Deterministic Part-of-Speech Tagging with Finite-State Transducers</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> MERL Yves Schabes* MERL </SectionTitle> <Paragraph position="0"> Stochastic approaches to natural language processing have often been preferred to rule-based approaches because of their robustness and their automatic training capabilities. This was the case for part-of-speech tagging until Brill showed how state-of-the-art part-of-speech tagging can be achieved with a rule-based tagger by inferring rules from a training corpus. However, current implementations of the rule-based tagger run more slowly than previous approaches. In this paper, we present a finite-state tagger, inspired by the rule-based tagger, that operates in optimal time in the sense that the time to assign tags to a sentence corresponds to the time required to follow a single path in a deterministic finite-state machine. This result is achieved by encoding the application of the rules found in the tagger as a nondeterministic finite-state transducer and then turning it into a deterministic transducer. The resulting deterministic transducer yields a part-of-speech tagger whose speed is dominated by the access time of mass storage devices. We then generalize the techniques to the class of transformation-based systems.</Paragraph> </Section> class="xml-element"></Paper>