XML Viewer - p97-1059

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/p97-1059_intro.xml
Size: 2,625 bytes
Last Modified: 2025-10-06 14:06:23
<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1059">
  <Title>Finite State Transducers Approximating Hidden Markov Models</Title>
  <Section position="4" start_page="0" end_page="460" type="intro">
    <SectionTitle>
\[DET, PRO\] \[ADJ,NOUN\] \[ADJ,NOUN\] ...... \[END\] (i)
DET ADJ NOUN ...... END
</SectionTitle>
    <Paragraph position="0"> The aim of the conversion is not to generate FSTs that behave in the same way, or in as similar a way as possible like IIMMs, but rather FSTs that perform tagging in as accurate a way as possible. The motivation to derive these FSTs from HMMs is that HMMs can be trained and converted with little manual effort.</Paragraph>
    <Paragraph position="1"> The tagging speed when using transducers is up to five times higher than when using the underlying HMMs. The main advantage of transforming an HMM is that the resulting transducer can be handled by finite state calculus. Among others, it can be composed with transducers that encode: * correction rules for the most frequent tagging errors which are automatically generated (Brill, 1992; Roche and Schabes, 1995) or manually written (Chanod and Tapanainen, 1995), in order to significantly improve tagging accuracy 2.</Paragraph>
    <Paragraph position="2"> These rules may include long-distance dependencies not handled by HMM taggers, and can conveniently be expressed by the replace operator (Kaplan and Kay, 1994; Karttunen, 1995; Kempe and Karttunen, 1996).</Paragraph>
    <Paragraph position="3"> * further steps of text analysis, e.g. light parsing or extraction of noun phrases or other phrases (Ait-Mokhtar and Chanod, 1997).</Paragraph>
    <Paragraph position="4"> These compositions enable complex text analysis to be performed by a single transducer.</Paragraph>
    <Paragraph position="5"> An IIMM transducer builds on the data (probability matrices) of the underlying HMM. The accuracy 2Automatically derived rules require less work than manually written ones but are unlikely to yield better results because they would consider relatively limited context and simple relations only.</Paragraph>
    <Paragraph position="6">  of this data has an impact on the tagging accuracy of both the HMM itself and the derived transducer. The training of the HMM can be done on either a tagged or untagged corpus, and is not a topic of this paper since it is exhaustively described in the literature (Bahl and Mercer, 1976; Church, 1988).</Paragraph>
    <Paragraph position="7"> An HMM can be identically represented by a weighted FST in a straightforward way. We are, however, interested in non-weighted transducers.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML