File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-1025_intro.xml
Size: 2,344 bytes
Last Modified: 2025-10-06 14:05:36
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-1025"> <Title>PROBABILISTIC TAGGING WITH FEATURI~ STR,UCTUR,I;3S</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 INTRODUCTION </SectionTitle> <Paragraph position="0"> 'l'he present article describes it probabillstic tagger based on a hidden Marl(or model (IIMM) (Rabiner, 1990) and employs tags which are fe,'iture structures.</Paragraph> <Paragraph position="1"> Their features concern part-of-speech (POS), gel,der, number, etc. and tlave only atouiie vahles.</Paragraph> <Paragraph position="2"> Usually, the contextual probability of a tag (state transition probability) is estimated dividing a trigrain frequency by a bigram frequency (second order II MM).</Paragraph> <Paragraph position="3"> With a large tag set resulting froin tire fact that the tags colitain besides or the POS a lot of lnorphological information, and with only a slnall training corpus available, most of these frequencies are too low for an exact estimation of contextual probabilities.</Paragraph> <Paragraph position="4"> Our feature structure tagger esthnates these probabilities by connecting contextual probabilities of the single fealvre-wdue-pai,'s (rv-pairs) of the tags (cf. sec. 2).</Paragraph> <Paragraph position="5"> Starting point for the iulph;nientation of the \['eature structure tagger was a second-order-li'IvlM tagger (trigrams) b~med on a modilied version of the Viterbi algorithm (Viterbi, 1967; Chllrch, 1988) which we had earlier implemented in C (l(empe ,1994). 'Flier{: we modified tim calculus of the contextual probabilities of the tags in the above-described way (cf see. 4).</Paragraph> <Paragraph position="6"> A test of both tatters under the sanle conditions Oli a French corpus 1 has shown that tile feature structure tagger is clearly better when tim available training col pus is small and the tag set is large but the tags are decomlmsable into relatively few fv-pairs. 'l'he hitter can be the case with morphologically rich languages when the tags contain a lot of morphological inforniation (cf. see. 5).</Paragraph> <Paragraph position="7"> 11 inll nmch obliged to Achim Stein and Leo W,tuner, ltonl~.UC~: l)ept., Univ. Stuttglirt, Gel'lll&liy, for t~rovidlng the corptlS and it dictionary.</Paragraph> </Section> class="xml-element"></Paper>