File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-2033_evalu.xml

Size: 6,365 bytes

Last Modified: 2025-10-06 13:59:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2033">
  <Title>Parser Combination by Reparsing</Title>
  <Section position="5" start_page="129" end_page="131" type="evalu">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"> In our dependency parsing experiments we used unlabeled dependencies extracted from the Penn  Treebank using the same head-table as Yamada and Matsumoto (2003), using sections 02-21 as training data and section 23 as test data, following (McDonald et al., 2005; Nivre &amp; Scholz, 2004; Yamada &amp; Matsumoto, 2003). Dependencies extracted from section 00 were used as held-out data, and section 22 was used as additional development data. For constituent parsing, we used the section splits of the Penn Treebank as described above, as has become standard in statistical parsing research.</Paragraph>
    <Section position="1" start_page="130" end_page="130" type="sub_section">
      <SectionTitle>
4.1 Dependency Reparsing Experiments
</SectionTitle>
      <Paragraph position="0"> Six dependency parsers were used in our combination experiments, as described below.</Paragraph>
      <Paragraph position="1"> The deterministic shift-reduce parsing algorithm of (Nivre &amp; Scholz, 2004) was used to create two parsers2, one that processes the input sentence from left-to-right (LR), and one that goes from right-to-left (RL). Because this deterministic algorithm makes a single pass over the input string with no back-tracking, making decisions based on the parser's state and history, the order in which input tokens are considered affects the result. Therefore, we achieve additional parser diversity with the same algorithm, simply by varying the direction of parsing. We refer to the two parsers as LR and RL.</Paragraph>
      <Paragraph position="2"> The deterministic parser of Yamada and Matsumoto (2003) uses an algorithm similar to Nivre and Scholz's, but it makes several successive left-to-right passes over the input instead of keeping a stack. To increase parser diversity, we used a version of Yamada and Matsumoto's algorithm where the direction of each of the consecutive passes over the input string alternates from left-to-right and right-to-left. We refer to this parser as LRRL.</Paragraph>
      <Paragraph position="3"> The large-margin parser described in (McDonald et al., 2005) was used with no alterations. Unlike the deterministic parsers above, this parser uses a dynamic programming algorithm (Eisner, 1996) to determine the best tree, so there is no difference between presenting the input from left-to-right or right-to-left.</Paragraph>
      <Paragraph position="4"> Three different weight configurations were considered: (1) giving all dependencies the same weight; (2) giving dependencies different weights, depending only on which parser generated the dependency; and (3) giving dependencies different 2 Nivre and Scholz use memory based learning in their experiments. Our implementation of their parser uses support vector machines, with improved results.</Paragraph>
      <Paragraph position="5"> weights, depending on which parser generated the dependency, and the part-of-speech of the dependent word. Option 2 takes into consideration that parsers may have different levels of accuracy, and dependencies proposed by more accurate parsers should be counted more heavily. Option 3 goes a step further, attempting to capitalize on the specific strengths of the different parsers.</Paragraph>
      <Paragraph position="6"> The weights in option 2 are determined by computing the accuracy of each parser on the held-out set (WSJ section 00). The weights are simply the corresponding parser's accuracy (number of correct dependencies divided by the total number of dependencies). The weights in option 3 are determined in a similar manner, but different accuracy figures are computed for each part-of-speech.</Paragraph>
      <Paragraph position="7"> Table 1 shows the dependency accuracy and root accuracy (number of times the root of the dependency tree was identified correctly divided by the number of sentences) for each of the parsers, and for each of the different weight settings in the reparsing experiments (numbered according to their descriptions above).</Paragraph>
      <Paragraph position="8">  individual dependency parsers and their combination under three different weighted reparsing settings.</Paragraph>
    </Section>
    <Section position="2" start_page="130" end_page="131" type="sub_section">
      <SectionTitle>
4.2 Constituent Reparsing Experiments
</SectionTitle>
      <Paragraph position="0"> The parsers that were used in the constituent reparsing experiments are: (1) Charniak and Johnson's (2005) reranking parser; (2) Henderson's (2004) synchronous neural network parser; (3) Bikel's (2002) implementation of the Collins (1999) model 2 parser; and (4) two versions of Sagae and Lavie's (2005) shift-reduce parser, one using a maximum entropy classifier, and one using support vector machines.</Paragraph>
      <Paragraph position="1"> Henderson and Brill's voting scheme mentioned in section 3 can be emulated by our reparsing approach by setting all weights to 1.0 and t to (m + 1)/2, but better results can be obtained by setting appropriate weights and adjusting the precision/recall tradeoff. Weights for different types of  constituents from each parser can be set in a similar way to configuration 3 in the dependency experiments. However, instead of measuring accuracy for each part-of-speech tag of dependents, we measure precision for each non-terminal label.</Paragraph>
      <Paragraph position="2"> The parameter t is set using held-out data (from WSJ section 22) and a simple hill-climbing procedure. First we set t to (m + 1)/2 (which heavily favors precision). We then repeatedly evaluate the combination of parsers, each time decreasing the value of t (by 0.01, say). We record the values of t for which precision and recall were closest, and for which f-score was highest.</Paragraph>
      <Paragraph position="3"> Table 2 shows the accuracy of each individual parser and for three reparsing settings. Setting 1 is the emulation of Henderson and Brill's voting. In setting 2, t is set for balancing precision and recall. In setting 3, t is set for highest f-score.</Paragraph>
      <Paragraph position="4">  parser and their combination under three different reparsing settings.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML