XML Viewer - c96-1011

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/96/c96-1011_relat.xml
Size: 2,156 bytes
Last Modified: 2025-10-06 14:16:05
<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1011">
  <Title>Unsupervised Learning of a Rule-based Spanish Part of Speech Tagger</Title>
  <Section position="6" start_page="57" end_page="57" type="relat">
    <SectionTitle>
5 Related Works
</SectionTitle>
    <Paragraph position="0"> It is very ditIicult to compare performances between taggers when accuracy depends on quality of corpora and lexicons, and maybe on characteris,its of languages. But in this section, we cornpare our tagger with Hidden Markov Model-based taggers.</Paragraph>
    <Paragraph position="1"> A more widely used algorithnl for unsupervised learning of a POS tagger is Hidden Markov Model (I1MM). Cutting el al. ((hitting et al., 1992) and Melialdo (Merialdo, 1994) used IIMM to learn English POS taggers while Chanod and 'I'apanainen (Chanod and Tapanainen, 1995), Feldweg (Feldweg, 1995), and Ledn and Serrano (l,e6n and Serrano, 1995) ported tile Xerox tagger (Cutting et al., 1992) to French, German, and Spanish respectively. One of tile drawbacks of an tlMM-based approach is that laborious manual tuning of symbol and transition biases is nec: essary to achieve high accuracy. Without tuned biases, the C, erman Xerox tagger achieved 85.89% while the French Xerox tagger achieved 87% accuracy. After one man-month of tuning biases, the accuracy of the French tagger increased to 96.8%.</Paragraph>
    <Paragraph position="2"> One could derive such biases fronl a corpus, as discussed in (Merialdo, 199d), but it unfortunately requires a tagged cort/us.</Paragraph>
    <Paragraph position="3"> 'Fhe best accuracy of the Spanish Xerox tag: ger was 91.51% for the reduced tag set (174 tags) lit can be a part of a last name as it, &amp;quot;van Mahler&amp;quot;, but also is an inflected form of &amp;quot;it&amp;quot;.</Paragraph>
    <Paragraph position="4"> with the hase accuracy (i.e. no training) of 88.98% while the best accuracy of our tagger is currently 92.1% for the simple tag set (39 tags) with the base accuracy of 78.6%. The lower base accuracy in our exl&gt;eriment is probably due to the large number of entries in the Collins dictionary.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML