File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/p01-1035_concl.xml
Size: 3,743 bytes
Last Modified: 2025-10-06 13:53:06
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1035"> <Title>Serial Combination of Rules and Statistics: A Case Study in Czech Tagging</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Conclusions </SectionTitle> <Paragraph position="0"> The improvements obtained (4.58% relative error reduction) beat the pure statistical classifier combination (Hladk'a, 2000), which obtained only 3% relative improvement. The most important task for the manual-rule component is to keep recall very close to 100%, with the task of improving precision as much as possible. Even though the rule-based component is still under development, the 19% relative improvement in F-measure over the baseline (i.e., 16% reduction in the Fcomplement while keeping recall just 0.34% under the absolute one) is encouraging.</Paragraph> <Paragraph position="1"> In any case, we consider the clear &quot;division of labor&quot; between the two parts of the system a strong advantage. It allows now and in the future to use different taggers and different rule-based systems within the same framework but in a completely independent fashion.</Paragraph> <Paragraph position="2"> The performance of the pure HMM tagger alone is an interesting result by itself, beating the best Czech tagger published (HajiVc and Hladk'a, 1998) by almost 2% (30% relative improvement) and a previous HMM tagger on Czech (M'irovsk'y, 1998) by almost 4% (44% relative improvement).</Paragraph> <Paragraph position="3"> We believe that the key to this success is both the increased data size (we have used three times more training data then reported in the previous papers) and the meticulous implementation of smoothing with bucketing together with using all possible tag trigrams, which has never been done before.</Paragraph> <Paragraph position="4"> One might question whether it is worthwhile to work on a manual rule component if the improvement over the pure statistical system is not so huge, and there is the obvious disadvantage in its language-specificity. However, we see at least two situations in which this is the case: first, the need for high quality tagging for local language projects, such as human-oriented lexicography, where every 1/10th of a percent of reduction in error rate counts, and second, a situation where not enough training data is available for a high-quality statistical tagger for a given language, but a language expertise does exist; the improvement over an imperfect statistical tagger should then be more visible8.</Paragraph> <Paragraph position="5"> Another interesting issue is the evaluation method used for taggers. From the linguistic point of view, not all errors are created equal; it is clear that the manual rule component does not commit linguistically trivial errors (see Sect. 4.2). However, the relative weighting (if any) of errors should be application-based, which is already outside of the scope of this paper.</Paragraph> <Paragraph position="6"> It has been also observed that the improved tagger can serve as an additional means for discovering annotator's errors (however infrequent they are, they are there). See Fig. 1 for an example of wrong annotation of &quot;se&quot;.</Paragraph> <Paragraph position="7"> In the near future, we plan to add more rules, as well as continue to work on the statistical tagging. The lexical component of the tagger might still have some room for improvement, such as the use 8However, a feature-based log-linear tagger might perform better for small training data, as argued in (HajiVc, 2000).</Paragraph> <Paragraph position="8"> which can be feasible with the powerful smoothing we now employ.</Paragraph> </Section> class="xml-element"></Paper>