File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/p93-1042_intro.xml

Size: 1,485 bytes

Last Modified: 2025-10-06 14:05:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="P93-1042">
  <Title>HOW DO WE COUNT? THE PROBLEM OF TAGGING PHRASAL VERBS IN PARTS</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Statistical taggers are commonly used to preprocess natural language. Operations like parsing, information retrieval, machine translation, and so on, are facilitated by having as input a text tagged with a part of speech label for each lexical item. In order to be useful, a tagger must be accurate as well as efficient. The claim among researchers advocating the use of statistics for NLP (e.g. Marcus et al. 92) is that taggers are routinely correct about 95% of the time. The 5% error rate is not perceived as a problem mainly because human taggers disagree or make mistakes at approximately the same rate. On the other hand, even a 5% error rate can cause a much higher rate of mistakes later in processing if the mistake falls on a key element that is crucial to the correct analysis of the whole sentence. One example is the phrasal verb construction (e.g. gun down, back off). An error in tagging this two element sequence will cause the analysis of the entire sentence to be faulty.</Paragraph>
    <Paragraph position="1"> An analysis of the errors made by the stochastic tagger PARTS (Church 88) reveals that phrasal verbs do indeed constitute a problem for the model.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML