File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/w05-1511_evalu.xml
Size: 5,763 bytes
Last Modified: 2025-10-06 13:59:32
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1511"> <Title>Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Efficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing</Title> <Section position="6" start_page="109" end_page="111" type="evalu"> <SectionTitle> 5 Evaluation </SectionTitle> <Paragraph position="0"> We evaluated the efficiency of the parsing techniques by using the HPSG for English developed by Miyao et al. (2005). The lexicon of the grammar was extracted from Sections 02-21 of the Penn Treebank (Marcus et al., 1994) (39,832 sentences). The grammar consisted of 2,284 lexical entry templates for 10,536 words1. The probabilistic disambiguation model of the grammar was trained using the same portion of the treebank (Miyao and Tsujii, 2005).</Paragraph> <Paragraph position="1"> 1Lexical entry templates for POS are also developed.</Paragraph> <Paragraph position="2"> They are assigned to unknown words.</Paragraph> <Paragraph position="3"> The model included 529,856 features. The parameters for beam searching were determined manually by trial and error using Section 22; d0 = 12,[?]d =</Paragraph> <Paragraph position="5"> the chunk parser developed by Tsuruoka and Tsujii (2005a). Table 1 shows the abbreviations used in presenting the results.</Paragraph> <Paragraph position="6"> We measured the accuracy of the predicate-argument relations output by the parser. A predicate-argument relation is defined as a tuple <s,wh,a,wa> , where s is the predicate type (e.g., adjective, intransitive verb), wh is the head word of the predicate, a is the argument label (MODARG, ARG1, ..., ARG4), and wa is the head word of the argument. Precision/recall is the ratio of tuples correctly identified by the parser. This evaluation scheme was the same as used in previous evaluations of lexicalized grammars (Hockenmaier, 2003; Clark and Curran, 2004; Miyao and Tsujii, 2005). The experiments were conducted on an AMD Opteron server with a 2.4-GHz CPU. Section 22 of the Treebank was used as the development set, and performance was evaluated using sentences of less than 40 words in Section 23 (2,164 sentences, 20.3 words/sentence). The performance of each parsing technique was analyzed using the sentences in Section 24 of less than 15 words (305 sentences) and less than 40 words (1145 sentences). null Table 2 shows the parsing performance using all full ... iterative + global + chp piter ... preserved iterative parsing qc ... quick check lci ... large constituent inhibition diff(*) ... (Avg. Time of full) - (Avg. Time) thresholding techniques and implementations described in Section 4 for the sentences in the development set (Section 22) and the test set (Section 23) of less than 40 words. In the table, precision, recall, average parsing time per sentence, and the number of sentences that the parser failed to parse are detailed. Figure 6 shows the distribution of parsing time for the sentence length.</Paragraph> <Paragraph position="7"> Table 3 shows the performance of the Viterbi parsing, beam search parsing, and iterative parsing for the sentences in Section 24 of less than 15 words 2. The parsing without beam searching took more than 1,000 times longer than with beam searching.</Paragraph> <Paragraph position="8"> However, the beam searching reduced the recall from 87.9% to 82.4%. The main reason for this reduction was parsing failure. That is, the parser could not output any results when the beam was too narrow instead of producing incorrect parse results. Although iterative parsing was originally developed for efficiency, the results revealed that it also increases the recall. This is because the parser continues trying until some results are output. Figure 7 shows the logarithmic graph of parsing time for the sentence length. The left side of the figure shows the parsing time of the Viterbi parsing and the right side shows the parsing time of the iterative parsing.</Paragraph> <Paragraph position="9"> Figure 8 shows the performance of the parsing techniques for different parameters for the sentences in Section 24 of less than 40 words. The combinations of thresholding techniques achieved better re2The sentence length was limited to 15 words because of inefficiency of Viterbi parsing sults than the single techniques. Local thresholding using the width (width) performed better than that using the number (num). The combination of using width and number (num+width) performed better than single local and single global thresholding. The superiority of iterative parsing (iterative) was again demonstrated in this experiment. Although we did not observe significant improvement with global thresholding, the global plus iterative combination slightly improved performance.</Paragraph> <Paragraph position="10"> Figure 9 shows the performance with and without the chunk parser. The lines with white symbols represent parsing without the chunk parser, and the lines with black symbols represent parsing with the chunk parser. The chunk parser improved the total parsing performance significantly. The improvements with global thresholding were less with the chunk parser.</Paragraph> <Paragraph position="11"> Finally, Table 4 shows the contribution to performance of each implementation for the sentences in Section 24 of less than 40 words. The 'full' means the parser including all thresholding techniques and implementations described in Section 4. The 'full [?] x' means the full minus x. The preserved iterative parsing, the quick check, and the chunk parser greatly contributed to the final parsing speed, while the global thresholding and large constituent inhibition did not.</Paragraph> </Section> class="xml-element"></Paper>