XML Viewer - w06-1619

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1619_metho.xml
Size: 10,077 bytes
Last Modified: 2025-10-06 14:10:49
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1619">
  <Title>Extremely Lexicalized Models for Accurate and Fast HPSG Parsing</Title>
  <Section position="5" start_page="157" end_page="157" type="metho">
    <SectionTitle>
3 Extremely lexicalized probabilistic
</SectionTitle>
    <Paragraph position="0"> models In the experiments, we tested parsing with the previous model for the probabilistic HPSG explained in Section 2 and other three types of probabilistic models defined with the probabilities of lexical entry selection. The first one is the simplest probabilistic model, which is defined with only the probabilities of lexical entry selection. It is defined simply as the product of the probabilities of selecting all lexical entries in the sentence; i.e., the model does not use the probabilities of phrase structures like the previous models.</Paragraph>
    <Paragraph position="1"> Given a set of lexical entries, L, a sentence, w = &lt;w1,...,wn&gt; , and the probabilistic model of lexical entry selection, p(li [?] L|w,i), the first model is formally defined as follows:</Paragraph>
    <Paragraph position="3"> where li is a lexical entry assigned to word wi in T and p(li|w,i) is the probability of selecting lexical entry li for wi.</Paragraph>
    <Paragraph position="4"> The second model is defined as the product of the probabilities of selecting all lexical entries in the sentence and the root node probability of the parse tree. That is, the second model is also defined without the probabilities on phrase structures: null  where Zmodel2 is the sum over the set of all possible parse trees for the sentence.</Paragraph>
    <Paragraph position="5"> The third model is a hybrid of model 1 and the previous model. The probabilities of the lexical entries in the previous model are replaced with the probabilities of lexical entry selection:  In this study, the same model parameters used in the previous model were used for phrase structures. null The probabilities of lexical entry selection, p(li|w,i), are defined as follows:</Paragraph>
    <Paragraph position="7"> procedure IterativeParsing(w, G, a0, b0, k0, d0, th0, [?]a, [?]b, [?]k, [?]d, [?]th, alast, blast, klast, dlast, thlast) a - a0; b - b0; k - k0; d - d0; th - th0; loop while a [?] alast and b [?] blast and k [?] klast and d [?] dlast and th [?] thlast call Parsing(w, G, a, b, k, d, th) if pi[1,n] negationslash= [?] then exit</Paragraph>
    <Paragraph position="9"> where Zw is the sum over all possible lexical entries for the word wi. The feature templates used in our model are listed in Table 2 and are word trigrams and POS 5-grams.</Paragraph>
  </Section>
  <Section position="6" start_page="157" end_page="160" type="metho">
    <SectionTitle>
4 Experiments
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="157" end_page="157" type="sub_section">
      <SectionTitle>
4.1 Implementation
</SectionTitle>
      <Paragraph position="0"> We implemented the iterative parsing algorithm (Ninomiya et al., 2005) for the probabilistic HPSG models. It first starts parsing with a narrow beam.</Paragraph>
      <Paragraph position="1"> If the parsing fails, then the beam is widened, and parsing continues until the parser outputs results or the beam width reaches some limit. Though the probabilities of lexical entry selection are introduced, the algorithm for the presented probabilistic models is almost the same as the original iterative parsing algorithm.</Paragraph>
      <Paragraph position="2"> The pseudo-code of the algorithm is shown in Figure 3. In the figure, the pi[i,j] represents the set of partial parse results that cover words wi+1,...,wj, and r[i,j,F] stores the maximum figure-of-merit (FOM) of partial parse result F at cell (i,j). The probability of lexical entry F is computed as summationtextu lufu(F) for the previous model, as shown in the figure. The probability of a lexical entry for models 1, 2, and 3 is computed as the probability of lexical entry selection, p(F|w,i). The FOM of a newly created partial parse, F, is computed by summing the values of r of the daughters and an additional FOM of F if the model is the previous model or model 3. The FOM for models 1 and 2 is computed by only summing the values of r of the daughters; i.e., weights exp(lu) in the figure are assigned zero. The terms k and d are the thresholds of the number of phrasal signs in the chart cell and the beam width for signs in the chart cell. The terms a and b are the thresholds of the number and the beam width of lexical entries, and th is the beam width for global thresholding (Goodman, 1997).</Paragraph>
    </Section>
    <Section position="2" start_page="157" end_page="160" type="sub_section">
      <SectionTitle>
4.2 Evaluation
</SectionTitle>
      <Paragraph position="0"> We evaluated the speed and accuracy of parsing with extremely lexicalized models by using Enju 2.1, the HPSG grammar for English (Miyao et al., 2005; Miyao and Tsujii, 2005). The lexicon of the grammar was extracted from Sections 02-21 of the Penn Treebank (Marcus et al., 1994) (39,832 sentences). The grammar consisted of 3,797 lexical entries for 10,536 words1. The probabilistic models were trained using the same portion of the treebank. We used beam thresholding, global thresholding (Goodman, 1997), preserved iterative parsing (Ninomiya et al., 2005) and other tech1An HPSG treebank is automatically generated from the Penn Treebank. Those lexical entries were generated by applying lexical rules to observed lexical entries in the HPSG treebank (Nakanishi et al., 2004). The lexicon, however, included many lexical entries that do not appear in the HPSG treebank. The HPSG treebank is used for training the probabilistic model for lexical entry selection, and hence, those lexical entries that do not appear in the treebank are rarely selected by the probabilistic model. The 'effective' tag set size, therefore, is around 1,361, the number of lexical entries without those never-seen lexical entries.</Paragraph>
      <Paragraph position="1">  niques for deep parsing2. The parameters for beam searching were determined manually by trial and error using Section 22: a0 = 4,[?]a = 4,alast =</Paragraph>
      <Paragraph position="3"> thlast = 20.0. With these thresholding parameters, the parser iterated at most five times for each sentence.</Paragraph>
      <Paragraph position="4"> We measured the accuracy of the predicate-argument relations output of the parser. A predicate-argument relation is defined as a tuple &lt;s,wh,a,wa&gt; , where s is the predicate type (e.g., adjective, intransitive verb), wh is the head word of the predicate, a is the argument label (MODARG, ARG1, ..., ARG4), and wa is the head word of the argument. Labeled precision (LP)/labeled recall (LR) is the ratio of tuples correctly identified by the parser3. Unlabeled precision (UP)/unlabeled recall (UR) is the ratio of tuples without the predicate type and the argument label. This evaluation scheme was the same as used in previous evaluations of lexicalized grammars (Hockenmaier, 2003; Clark and Cur2Deep parsing techniques include quick check (Malouf et al., 2000) and large constituent inhibition (Kaplan et al., 2004) as described by Ninomiya et al. (2005), but hybrid parsing with a CFG chunk parser was not used. This is because we did not observe a significant improvement for the development set by the hybrid parsing and observed only a small improvement in the parsing speed by around 10 ms. 3When parsing fails, precision and recall are evaluated, although nothing is output by the parser; i.e., recall decreases greatly.</Paragraph>
      <Paragraph position="5"> ran, 2004b; Miyao and Tsujii, 2005). The experiments were conducted on an AMD Opteron server with a 2.4-GHz CPU. Section 22 of the Treebank was used as the development set, and the performance was evaluated using sentences of [?] 40 and 100 words in Section 23. The performance of each parsing technique was analyzed using the sentences in Section 24 of [?] 100 words.</Paragraph>
      <Paragraph position="6"> Table 3 details the numbers and average lengths of the tested sentences of [?] 40 and 100 words in Sections 23 and 24, and the total numbers of sentences in Sections 23 and 24.</Paragraph>
      <Paragraph position="7"> The parsing performance for Section 23 is shown in Table 4. The upper half of the table shows the performance using the correct POSs in the Penn Treebank, and the lower half shows the performance using the POSs given by a POS tagger (Tsuruoka and Tsujii, 2005). The left and right sides of the table show the performances for the sentences of [?] 40 and [?] 100 words. Our models significantly increased not only the parsing speed but also the parsing accuracy. Model 3 was around three to four times faster and had around two points higher precision and recall than the previous model. Surprisingly, model 1, which used only lexical information, was very fast and as accurate as the previous model. Model 2 also improved the accuracy slightly without information of phrase structures. When the automatic POS tagger was introduced, both precision and recall dropped by around 2 points, but the tendency towards improved speed and accuracy was again ob- null The unlabeled precisions and recalls of the previous model and models 1, 2, and 3 were significantly different as measured using stratified shuffling tests (Cohen, 1995) with p-values &lt; 0.05.</Paragraph>
      <Paragraph position="8"> The labeled precisions and recalls were significantly different among models 1, 2, and 3 and between the previous model and model 3, but were not significantly different between the previous model and model 1 and between the previous model and model 2.</Paragraph>
      <Paragraph position="9"> The average parsing time and labeled F-score curves of each probabilistic model for the sentences in Section 24 of[?]100 words are graphed in  observed in the figure. Model 3 performed significantly better than the previous model. Models 1 and 2 were significantly faster with almost the same accuracy as the previous model.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML