File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0731_metho.xml

Size: 5,160 bytes

Last Modified: 2025-10-06 14:07:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0731">
  <Title>Shallow Parsing as Part-of-Speech Tagging*</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The Tagger
</SectionTitle>
    <Paragraph position="0"> We used Ratnaparkhi's maximum entropy-based POS tagger (Ratnaparkhi, 1996). When tagging, the model tries to recover the most likely (unobserved) tag sequence, given a sequence of observed words.</Paragraph>
    <Paragraph position="1"> For our experiments, we used the binary-only distribution of the tagger (Ratnaparkhi, 1996).  The insight here is that one can view (some of) the differences between tagging and (shallow) parsing as one of context: shallow parsing requires access to a greater part of the surrounding lexical/POS syntactic environment than does simple POS tagging. This extra information can be encoded in a state.</Paragraph>
    <Paragraph position="2"> However, one must balance this approach with the fact that as the amount of information in a state increases, with limited training material, the chance of seeing such a state again in the future diminishes. We therefore would expect performance to increase as we increased the amount of information in a state, and then decrease when overfitting and/or sparse statistics become dominate factors.</Paragraph>
    <Paragraph position="3"> We trained the tagger using 'words' that were various 'configurations' (concatenations) of actual words, POS tags, chunk-types, and/or suffixes or prefixes of words and/or chunk-types. By training upon these concatenations, we help bridge the gap between simple POS tagging and shallow parsing.</Paragraph>
    <Paragraph position="4"> In the rest of the paper, we refer to what the tagger considers to be a word as a configuration. A configuration will be a concatenation of various elements of the training set relevant to decision making regarding chunk assignment. A 'word' will mean a word as found in the training set. 'Tags' refer to the POS tags found in the training set. Again, such tags may be part of a configuration. We refer to what the tagger considers as a tag as a prediction. Predictions will be chunk labels.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="145" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> We now give details of the experiments we ran.</Paragraph>
    <Paragraph position="1"> To make matters clearer, consider the following  fragment of the training set:</Paragraph>
    <Paragraph position="3"> Words are wl,w2 and w3, tags are t~L,t2 and t3 and chunk labels are cl, c2 and ca. Throughout, we built various configurations when predicting 1 the chunk label for word wl.</Paragraph>
    <Paragraph position="4"> With respect to the situation just mentioned (predicting the label for word wl), we gradually increased the amount of information in each configuration as follows:  1. A configuration consisting of just words (word wl). Results:  configurations consisting of tags and words (wl and tl). The training set was then reduced to consist of just tag-word configurations and tagged using this model. Afterwards, we collected the predictions for use in the second model. Results:  The final configuration made an attempt to take deal with sparse statistics. It consisted of the current tag tl, the next tag t2, the current chunk label cl, the last two letters of the next chunk label c2, the first two letters of the current word wl and the last four letters of the current word wl. This configuration was the result of numerous experiments and gave the best overall performance. The results can be found in Table 1.</Paragraph>
    <Paragraph position="5"> We remark upon our experiments in the comments section.</Paragraph>
  </Section>
  <Section position="6" start_page="145" end_page="146" type="metho">
    <SectionTitle>
5 Error Analysis
</SectionTitle>
    <Paragraph position="0"> We examined the performance of our final model with respect to the testing material and found that errors made by our shallow parser could be grouped into three categories: difficult syntactic constructs, mistakes made in the training or testing material by the annotators, and errors peculiar to our approach. 2 Taking each category of the three in turn, problematic constructs included: co-ordination, punctuation, treating ditransitive VPs as being transitive VPs, confusions regarding adjective or adverbial phrases, and copulars seen as be- null Mistakes (noise) in the training and testing material were mainly POS tagging errors. An additional source of errors were odd annotation decisions.</Paragraph>
    <Paragraph position="1"> The final source of errors were peculiar to our system. Exponential distributions (as used by our tagger) assign a non-zero probability to all possible events. This means that the tagger will at times assign chunk labels that are illegal, for example assigning a word the label I-NP when the word is not in a NP. Although these errors were infrequent, eliminating them would require 'opening-up' the tagger and rejecting illegal hypothesised chunk labels from consideration.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML