File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3235_intro.xml
Size: 3,504 bytes
Last Modified: 2025-10-06 14:02:51
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3235"> <Title>Error Measures and Bayes Decision Rules Revisited with Applications to POS Tagging</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Meanwhile, the statistical approach to natural language processing (NLP) tasks like speech recognition, POS tagging and machine translation has found widespread use. There are three ingredients to any statistical approach to NLP, namely the Bayes decision rule, the probability models (like trigram model, HMM, ...) and the training criterion (like maximum likelihood, mutual information, ...).</Paragraph> <Paragraph position="1"> The topic of this paper is to re-consider the form of the Bayes decision rule. In virtually all NLP tasks, the specific form of the Bayes decision rule is never questioned, and the decision rule is adapted from speech recognition. In speech recognition, the typical decision rule is to maximize the sentence probability over all possible sentences. However, this decision rule is optimal for the sentence error rate and not for the word error rate. This difference is rarely studied in the literature.</Paragraph> <Paragraph position="2"> As a specific NLP task, we will consider part-of-speech (POS) tagging. However, the problem addressed comes up in any NLP task which is tackled by the statistical approach and which makes use of a Bayes decision rule. Other prominent examples are speech recognition and machine translation. The advantage of the POS tagging task is that it will be easier to handle from the mathematical point of view and will result in closed-form solutions for the decision rules. From this point-of-view, the POS tagging task serves as a good opportunity to illustrate the key concepts of the statistical approach to NLP.</Paragraph> <Paragraph position="3"> Related Work: For the task of POS tagging, statistical approaches were proposed already in the 60's and 70's (Stolz et al., 1965; Bahl and Mercer, 1976), before they started to find widespread use in the 80's (Beale, 1985; DeRose, 1989; Church, 1989).</Paragraph> <Paragraph position="4"> To the best of our knowledge, the 'standard' version of the Bayes decision rule, which minimizes the number of string errors, is used in virtually all approaches to POS tagging and other NLP tasks.</Paragraph> <Paragraph position="5"> There are only two research groups that do not take this type of decision rule for granted: (Merialdo, 1994): In the context of POS tagging, the author introduces a method that he calls maximum likelihood tagging. The spirit of this method is similar to that of this work. However, this method is mentioned as an aside and its implications for the Bayes decision rule and the statistical approach are not addressed. Part of this work goes back to (Bahl et al., 1974) who considered a problem in coding theory.</Paragraph> <Paragraph position="6"> (Goel and Byrne, 2003): The error measure considered by the authors is the word error rate in speech recognition, i.e. the edit distance. Due to the mathematical complexity of this error measure, the authors resort to numeric approximations to compute the Bayes risk (see next section).</Paragraph> <Paragraph position="7"> Since this approach does not results in explicit closed-form equations and involves many numeric approximations, it is not easy to draw conclusions from this work.</Paragraph> </Section> class="xml-element"></Paper>