File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/c94-1025_evalu.xml
Size: 2,058 bytes
Last Modified: 2025-10-06 14:00:17
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-1025"> <Title>PROBABILISTIC TAGGING WITH FEATURI~ STR,UCTUR,I;3S</Title> <Section position="7" start_page="163" end_page="164" type="evalu"> <SectionTitle> 5 TAGGING RESULTS </SectionTitle> <Paragraph position="0"> In tile training arm tagging process we experimented with different values for parameters like: minimal admitted frequency for preselection, admitted percentua\] difference c between probabilities considered to bc equal, etc. (cf. see. 3).</Paragraph> <Paragraph position="1"> The feature structure tagger was trained on the French 10,000 words corpus already mentioned ill table 1, with the fonr different training methods (see. 3). When tagging a 6,000 words corpus 6 with an average ambiguity of 2.63 tags per word (after the dictionary SNo overlap betWeell training and test corpora.</Paragraph> <Paragraph position="2"> different taggers, corpora, tag sets and IIMM orders Comparatively, we used a &quot;traditional&quot; II/VlMtagger (cf. see. 4) on the same training and test corpora and got an accuracy of 83.23 % 7, i.e. the error rate was about 50 % higher than with the feature structure tagger (table 2).</Paragraph> <Paragraph position="3"> When we used a tool which always selects the lexitally most probable tag without considering the context we obtained an accuracy of 83.81%, which is even better than with the &quot;traditional&quot; IIMM-tagger. Provided with enough training data and working on a small tag set, our &quot;traditional&quot; tagger got an accuracy of 96.16 % (Kempe ,1994), which is usual in tiffs case (Cutting et a1.,1992). The English test cori)us we used here had an average amt)iguity of 2.61 tags per word which is amazingly similar to the aml)iguity o\[&quot; the French corpus.</Paragraph> <Paragraph position="4"> The feature structure tagger is clearly bel, l.er when the available training corpus is small and the tag set large but the tags are decomposal)le into few fv-pairs.</Paragraph> </Section> class="xml-element"></Paper>