File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-1511_metho.xml

Size: 21,447 bytes

Last Modified: 2025-10-06 14:09:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1511">
  <Title>Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Efficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing</Title>
  <Section position="3" start_page="0" end_page="103" type="metho">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> We investigated the performance efficacy of beam search parsing and deep parsing techniques in probabilistic head-driven phrase structure grammar (HPSG) parsing for the Penn treebank. We first applied beam thresholding techniques developed for CFG parsing to HPSG parsing, including local thresholding, global thresholding (Goodman, 1997), and iterative parsing (Tsuruoka and Tsujii, 2005b).</Paragraph>
    <Paragraph position="1"> Next, we applied parsing techniques developed for deep parsing, including quick check (Malouf et al., 2000), large constituent inhibition (Kaplan et al., 2004) and hybrid parsing with a CFG chunk parser (Daum et al., 2003; Frank et al., 2003; Frank, 2004).</Paragraph>
    <Paragraph position="2"> The experiments showed how each technique contributes to the final output of parsing in terms of precision, recall, and speed for the Penn treebank.</Paragraph>
    <Paragraph position="3"> Unification-based grammars have been extensively studied in terms of linguistic formulation and computation efficiency. Although they provide precise linguistic structures of sentences, their processing is considered expensive because of the detailed descriptions. Since efficiency is of particular concern in practical applications, a number of studies have focused on improving the parsing efficiency of unification-based grammars (Oepen et al., 2002). Although significant improvements in efficiency have been made, parsing speed is still not high enough for practical applications.</Paragraph>
    <Paragraph position="4"> The recent introduction of probabilistic models of wide-coverage unification-based grammars (Malouf and van Noord, 2004; Kaplan et al., 2004; Miyao and Tsujii, 2005) has opened up the novel possibility of increasing parsing speed by guiding the search path using probabilities. That is, since we often require only the most probable parse result, we can compute partial parse results that are likely to contribute to the final parse result. This approach has been extensively studied in the field of probabilistic  CFG (PCFG) parsing, such as Viterbi parsing and beam thresholding.</Paragraph>
    <Paragraph position="5"> While many methods of probabilistic parsing for unification-based grammars have been developed, their strategy is to first perform exhaustive parsing without using probabilities and then select the highest probability parse. The behavior of their algorithms is like that of the Viterbi algorithm for PCFG parsing, so the correct parse with the highest probability is guaranteed. The interesting point of this approach is that, once the exhaustive parsing is completed, the probabilities of non-local dependencies, which cannot be computed during parsing, are computed after making a packed parse forest. Probabilistic models where probabilities are assigned to the CFG backbone of the unification-based grammar have been developed (Kasper et al., 1996; Briscoe and Carroll, 1993; Kiefer et al., 2002), and the most probable parse is found by PCFG parsing.</Paragraph>
    <Paragraph position="6"> This model is based on PCFG and not probabilistic unification-based grammar parsing. Geman and Johnson (Geman and Johnson, 2002) proposed a dynamic programming algorithm for finding the most probable parse in a packed parse forest generated by unification-based grammars without expanding the forest. However, the efficiency of this algorithm is inherently limited by the inefficiency of exhaustive parsing.</Paragraph>
    <Paragraph position="7"> In this paper we describe the performance of beam thresholding, including iterative parsing, in probabilistic HPSG parsing for a large-scale corpora, the Penn treebank. We show how techniques developed for efficient deep parsing can improve the efficiency of probabilistic parsing. These techniques were evaluated in experiments on the Penn Treebank (Marcus et al., 1994) with the wide-coverage HPSG parser developed by Miyao et al. (Miyao et al., 2005; Miyao and Tsujii, 2005).</Paragraph>
  </Section>
  <Section position="4" start_page="103" end_page="104" type="metho">
    <SectionTitle>
2 HPSG and probabilistic models
</SectionTitle>
    <Paragraph position="0"> HPSG (Pollard and Sag, 1994) is a syntactic theory based on lexicalized grammar formalism. In HPSG, a small number of schemata describe general construction rules, and a large number of lexical entries express word-specific characteristics. The structures of sentences are explained using combinations of schemata and lexical entries. Both schemata and lexical entries are represented by typed feature structures, and constraints represented by feature structures are checked with unification.</Paragraph>
    <Paragraph position="1"> Figure 1 shows an example of HPSG parsing of the sentence &amp;quot;Spring has come.&amp;quot; First, each of the lexical entries for &amp;quot;has&amp;quot; and &amp;quot;come&amp;quot; is unified with a daughter feature structure of the Head-Complement  Schema. Unification provides the phrasal sign of the mother. The sign of the larger constituent is obtained by repeatedly applying schemata to lexical/phrasal signs. Finally, the parse result is output as a phrasal sign that dominates the sentence.</Paragraph>
    <Paragraph position="2"> Given set W of words and set F of feature structures, an HPSG is formulated as a tuple, G = &lt;L,R&gt; , where</Paragraph>
    <Paragraph position="4"> entries, and R is a set of schemata, i.e., r [?] R is a partial function: F xF - F.</Paragraph>
    <Paragraph position="5"> Given a sentence, an HPSG computes a set of phrasal signs, i.e., feature structures, as a result of parsing. Previous studies (Abney, 1997; Johnson et al., 1999; Riezler et al., 2000; Miyao et al., 2003; Malouf and van Noord, 2004; Kaplan et al., 2004; Miyao and Tsujii, 2005) defined a probabilistic model of unification-based grammars as a log-linear model or maximum entropy model (Berger et al., 1996). The probability of parse result T assigned to given sentence w = &lt;w1,... ,wn&gt; is</Paragraph>
    <Paragraph position="7"> where li is a model parameter, and fi is a feature function that represents a characteristic of parse tree T. Intuitively, the probability is defined as the normalized product of the weights exp(li) when a characteristic corresponding to fi appears in parse result T. Model parameters li are estimated using numer- null ical optimization methods (Malouf, 2002) so as to maximize the log-likelihood of the training data.</Paragraph>
    <Paragraph position="8"> However, the above model cannot be easily estimated because the estimation requires the computation of p(T|w) for all parse candidates assigned to sentence w. Because the number of parse candidates is exponentially related to the length of the sentence, the estimation is intractable for long sentences.</Paragraph>
    <Paragraph position="9"> To make the model estimation tractable, Geman and Johnson (Geman and Johnson, 2002) and Miyao and Tsujii (Miyao and Tsujii, 2002) proposed a dynamic programming algorithm for estimating p(T|w). They assumed that features are functions on nodes in a packed parse forest. That is, parse tree T is represented by a set of nodes, i.e., T = {c}, and the parse forest is represented by an and/or graph of the nodes. From this assumption, we can redefine the probability as</Paragraph>
    <Paragraph position="11"> A packed parse forest has a structure similar to a chart of CFG parsing, and c corresponds to an edge in the chart. This assumption corresponds to the independence assumption in PCFG; that is, only a nonterminal symbol of a mother is considered in further processing by ignoring the structure of its daughters. With this assumption, we can compute the figures of merit (FOMs) of partial parse results.</Paragraph>
    <Paragraph position="12"> This assumption restricts the possibility of feature functions that represent non-local dependencies expressed in a parse result. Since unification-based grammars can express semantic relations, such as predicate-argument relations, in their structure, the assumption unjustifiably restricts the flexibility of probabilistic modeling. However, previous research (Miyao et al., 2003; Clark and Curran, 2004; Kaplan et al., 2004) showed that predicate-argument relations can be represented under the assumption of feature locality. We thus assumed the locality of feature functions and exploited it for the efficient search of probable parse results.</Paragraph>
  </Section>
  <Section position="5" start_page="104" end_page="109" type="metho">
    <SectionTitle>
3 Techniques for e cient deep
</SectionTitle>
    <Paragraph position="0"> parsing Many of the techniques for improving the parsing efficiency of deep linguistic analysis have been developed in the framework of lexicalized grammars such as lexical functional grammar (LFG) (Bresnan, 1982), lexicalized tree adjoining grammar (LTAG) (Shabes et al., 1988), HPSG (Pollard and Sag, 1994) or combinatory categorial grammar (CCG) (Steedman, 2000). Most of them were developed for exhaustive parsing, i.e., producing all parse results that are given by the grammar (Matsumoto et al., 1983; Maxwell and Kaplan, 1993; van Noord, 1997; Kiefer et al., 1999; Malouf et al., 2000; Torisawa et al., 2000; Oepen et al., 2002; Penn and Munteanu, 2003). The strategy of exhaustive parsing has been widely used in grammar development and in parameter training for probabilistic models.</Paragraph>
    <Paragraph position="1"> We tested three of these techniques.</Paragraph>
    <Paragraph position="2"> Quick check Quick check filters out non-unifiable feature structures (Malouf et al., 2000). Suppose we have two non-unifiable feature structures. They are destructively unified by traversing and modifying them, and then finally they are found to be not unifiable in the middle of the unification process. Quick check quickly judges their unifiability by peeping the values of the given paths. If one of the path values is not unifiable, the two feature structures cannot be unified because of the necessary condition of unification. In our implementation of quick check, each edge had two types of arrays. One contained the path values of the edge's sign; we call this the sign array. The other contained the path values of the right daughter of a schema such that its left daughter is unified with the edge's sign; we call this a schema array. When we apply a schema to two edges, e1 and e2, the schema array of e1 and the sign array of e2 are quickly checked. If it fails, then quick check returns a unification failure. If it succeeds, the signs are unified with the schemata, and the result of unification is returned.</Paragraph>
    <Paragraph position="3"> Large constituent inhibition (Kaplan et al., 2004) It is unlikely for a large medial edge to contribute to the final parsing result if it spans more than 20 words and is not adjacent to the beginning or ending of the sentence. Large constituent inhibition prevents the parser from generating medial edges that span more than some word length.</Paragraph>
    <Paragraph position="4"> HPSG parsing with a CFG chunk parser A hybrid of deep parsing and shallow parsing was recently found to improve the efficiency of deep parsing (Daum et al., 2003; Frank et al., 2003; Frank, 2004). As a preprocessor, the shallow parsing must be very fast and achieve high precision but not high recall so that the  procedure Viterbi(&lt;w1, . . . , wn&gt; , &lt;Lprime, R&gt; , k, d, th) for i = 1 to n foreach Fu [?]{F|&lt;wi, F&gt; [?]L} a =summationtexti lifi(Fu) pi[i[?]1, i]-pi[i[?]1, i][?]{Fu} if (a &gt; r[i[?]1, i, Fu]) then</Paragraph>
    <Paragraph position="6"> foreach Fs [?]pi[i, k], Ft [?]pi[k, j], r[?]R if F = r(Fs, Ft) has succeeded a = r[i, k, Fs] + r[k, j, Ft] +summationtexti lifi(F) pi[i, j]-pi[i, j][?]{F} if (a &gt; r[i, j, F]) then  total parsing performance in terms of precision, recall and speed is not degraded. Because there is trade-off between speed and accuracy in this approach, the total parsing performance for large-scale corpora like the Penn treebank should be measured. We introduce a CFG chunk parser (Tsuruoka and Tsujii, 2005a) as a preprocessor of HPSG parsing. Chunk parsers meet the requirements for preprocessors; they are very fast and have high precision. The grammar for the chunk parser is automatically extracted from the CFG treebank translated from the HPSG treebank, which is generated during grammar extraction from the Penn treebank. The principal idea of using the chunk parser is to use the bracket information, i.e., parse trees without non-terminal symbols, and prevent the HPSG parser from generating edges that cross brackets.</Paragraph>
    <Paragraph position="7"> 4 Beam thresholding for HPSG parsing</Paragraph>
    <Section position="1" start_page="105" end_page="107" type="sub_section">
      <SectionTitle>
4.1 Simple beam thresholding
</SectionTitle>
      <Paragraph position="0"> Many algorithms for improving the efficiency of PCFG parsing have been extensively investigated.</Paragraph>
      <Paragraph position="1"> They include grammar compilation (Tomita, 1986; Nederhof, 2000), the Viterbi algorithm, controlling search strategies without FOM such as left-corner parsing (Rosenkrantz and Lewis II, 1970) or head-corner parsing (Kay, 1989; van Noord, 1997), and with FOM such as the beam search, the best-first search or A* search (Chitrao and Grishman, 1990; Caraballo and Charniak, 1998; Collins, 1999; Ratnaparkhi, 1999; Charniak, 2000; Roark, 2001; Klein and Manning, 2003). The beam search and best-first search algorithms significantly reduce the time required for finding the best parse at the cost of losing the guarantee of finding the correct parse.</Paragraph>
      <Paragraph position="2"> The CYK algorithm, which is essentially a bottom-up parser, is a natural choice for non-probabilistic HPSG parsers. Many of the constraints are expressed as lexical entries in HPSG, and bottom-up parsers can use those constraints to reduce the search space in the early stages of parsing.</Paragraph>
      <Paragraph position="3"> For PCFG, extending the CYK algorithm to output the Viterbi parse is straightforward (Ney, 1991; Jurafsky and Martin, 2000). The parser can efficiently calculate the Viterbi parse by taking the maximum of the probabilities of the same nonterminal symbol in each cell. With the probabilistic model defined in Section 2, we can also define the Viterbi search for unification-based grammars (Geman and Johnson, 2002). Figure 2 shows the pseudo-code of Viterbi algorithm. The pi[i,j] represents the set of partial parse results that cover words wi+1,... ,wj, and r[i,j,F] stores the maximum FOM of partial parse result F at cell (i,j). Feature functions are defined over lexical entries and results of rule applications, which correspond to conjunctive nodes in a feature forest. The FOM of a newly created partial parse, F, is computed by summing the values of r of the daughters and an additional FOM of F.</Paragraph>
      <Paragraph position="4"> The Viterbi algorithm enables various pruning techniques to be used for efficient parsing. Beam thresholding (Goodman, 1997) is a simple and effective technique for pruning edges during parsing. In each cell of the chart, the method keeps only a portion of the edges which have higher FOMs compared to the other edges in the same cell.</Paragraph>
      <Paragraph position="6"> foreach Fs [?]pi[i, k], Ft [?]pi[k, j], r[?]R if F = r(Fs, Ft) has succeeded a = r[i, k, Fs] + r[k, j, Ft] +summationtexti lifi(F) pi[i, j]-pi[i, j][?]{F} if (a &gt; r[i, j, F]) then</Paragraph>
      <Paragraph position="8"> procedure IterativeBeamThresholding(w, G, k0, d0, th0, [?]k, [?]d, [?]th, klast, dlast, thlast) k-k0; d-d0; th-th0 loop while k[?]klast and d[?]dlast and th[?]thlast call BeamThresholding(w, G, k, d, th) if pi[1, n]negationslash=[?]then exit k-k + [?]k; d-d + [?]d; th-th + [?]th  We tested three selection schemes for deciding which edges to keep in each cell.</Paragraph>
      <Paragraph position="9"> Local thresholding by number of edges Each cell keeps the top k edges based on their FOMs. Local thresholding by beam width Each cell keeps the edges whose FOM is greater than amax [?] d, where amax is the highest FOM among the edges in the cell.</Paragraph>
      <Paragraph position="10"> Global thresholding by beam width Each cell keeps the edges whose global FOM is greater than amax[?]th, where amax is the highest global FOM in the chart.</Paragraph>
      <Paragraph position="11"> Figure 3 shows the pseudo-code of local beam search, and global beam search algorithms for probabilistic HPSG parsing. The code for local thresholding is inserted at the end of the computation for each cell. In Figure 3, pi[i,j]k denotes the k-th element in sorted set pi[i,j]. We first take the first k elements that have higher FOMs and then remove the elements with FOMs lower than amax [?] d.</Paragraph>
      <Paragraph position="12"> Global thresholding is also used for pruning edges, and was originally proposed for CFG parsing (Goodman, 1997). It prunes edges based on their global FOM and the best global FOM in the chart. The global FOM of an edge is defined as its FOM plus its forward and backward FOMs, where the forward and backward FOMs are rough estimations of the outside FOM of the edge. The global thresholding is performed immediately after each line of the CYK chart is completed. The forward FOM is calculated first, and then the backward FOM is calculated. Finally, all edges with a global FOM lower than amax [?] th are pruned. Figure 3 gives further details of the algorithm. null</Paragraph>
    </Section>
    <Section position="2" start_page="107" end_page="109" type="sub_section">
      <SectionTitle>
4.2 Iterative beam thresholding
</SectionTitle>
      <Paragraph position="0"> We tested the iterative beam thresholding proposed by Tsuruoka and Tsujii (2005b). We started the parsing with a narrow beam. If the parser output results, they were taken as the final parse results. If the parser did not output any results, we widened the  num local beam thresholding by number width local beam thresholding by width global global beam thresholding by width iterative iterative parsing with local beam thresholding by number and width chp parsing with CFG chunk parser beam, and reran the parsing. We continued widening the beam until the parser output results or the beam width reached some limit.</Paragraph>
      <Paragraph position="1"> The pseudo-code is presented in Figure 4. It calls the beam thresholding procedure shown in Figure 3 and increases parameters k and d until the parser outputs results, i.e., pi[1,n] negationslash= [?]. Preserved iterative parsing Our implemented CFG parser with iterative parsing cleared the chart and edges at every iteration although the parser regenerated the same edges using those generated in the previous iteration. This is because the computational cost of regenerating edges is smaller than that of reusing edges to which the rules have already been applied. For HPSG parsing, the regenerating cost is even greater than that for CFG parsing. In our implementation of HPSG parsing, the chart and edges were not cleared during the iterative parsing. Instead, the pruned edges were marked as thresholded ones. The parser counted the number of iterations, and when edges were generated, they were marked with the iteration number, which we call the generation. If edges were thresholded out, the generation was replaced with the current iteration number plus 1. Suppose we have two edges, e1 and e2. The grammar rules are applied iff both e1 and e2 are not thresholded out, and the generation of e1 or e2 is equal to the current iteration number.</Paragraph>
      <Paragraph position="2"> Figure 5 shows the pseudo-code of preserved iterative parsing.</Paragraph>
      <Paragraph position="3">  procedure BeamThresholding(&lt;w1, . . . , wn&gt; , &lt;Lprime, R&gt; , k, d, th, iternum)</Paragraph>
      <Paragraph position="5"> foreach Fs [?]ph[i, k], Ft [?]ph[k, j], r[?]R if gen[i, k, Fs] = iternum[?]gen[k, j, Ft] = iternum if F = r(Fs, Ft) has succeeded gen[i, j, F]-iternum a = r[i, k, Fs] + r[k, j, Ft] +summationtexti lifi(F) pi[i, j]-pi[i, j][?]{F} if (a &gt; r[i, j, F]) then r[i, j, F]-a LocalThresholding(k, d, iternum) GlobalThresholding(n, th, iternum) procedure LocalThresholding(k, d, iternum) sort pi[i, j] according to r[i, j, F]</Paragraph>
      <Paragraph position="7"> for j = i + 1 to n foreach F [?]pi[i, j] f[j]-max(f[j], f[i] + r[i, j, F]) #backward for i = n[?]1 to 0 for j = i + 1 to n foreach F [?]pi[i, j]</Paragraph>
      <Paragraph position="9"> for j = i + 1 to n foreach F [?]ph[i, j] if f[i] + r[i, j, F] + b[j] &lt; amax[?]th then ph[i, j]-ph[i, j]\{F} foreach F [?](pi[i, j][?]ph[i, j]) gen[i, j, F]-iternum + 1 procedure IterativeBeamThresholding(w, G, k0, d0, th0, [?]k, [?]d, [?]th, klast, dlast, thlast) k-k0; d-d0; th-th0; iternum = 0 loop while k[?]klast and d[?]dlast and th[?]thlast call BeamThresholding(w, G, k, d, th, iternum) if pi[1, n]negationslash=[?]then exit k-k + [?]k; d-d + [?]d; th-th + [?]th; iternum-iternum + 1</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML