XML Viewer - c04-1024

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1024_metho.xml
Size: 12,437 bytes
Last Modified: 2025-10-06 14:08:42
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1024">
  <Title>Ef cient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Computation of the Chart
</SectionTitle>
    <Paragraph position="0"> In the rst step, the parser computes the CKY-style recogniser chart with the algorithm shown in Figure 1. It uses the transformed grammar with grammar rules P and non-terminal symbol set N. The chart is conceptually a three-dimensional bit array containing one bit for each possible constituent. A bit is 1 if the respective constituent has been inserted into the chart and 0 otherwise. The chart is indexed by the start position, the end position and the label of a constituent2. Initially all bits are 0. This chart representation is particularly ef cient for highly ambiguous grammars like treebank grammars where the chart is densely lled.</Paragraph>
    <Paragraph position="1">  Like other CKY-style parsers, the recogniser consists of several nested loops. The rst loop (line 3 in Fig. 1) iterates over the end positions e of constituents, inserts the parts of speech of the next word (lines 4 and 5) into the chart, and then builds increasingly larger constituents ending at position e. To this end, it iterates over the start positions b from e-1 down to 1 (line 6) and over all non-terminals A (line 7). Inside the innermost loop, the function derivable is called to compute whether a constituent of category A covering words a6a15a14 through a6 a8 is derivable from smaller constituents via some 2Start and end position of a constituent are the indices of the rst and the last word covered by the constituent.</Paragraph>
    <Paragraph position="2"> binary rule. derivable loops over all rules Aa0 B C with the symbol A on the left-hand side (line 11) and over all possible end positions m of the rst symbol on the right-hand side of the rule (line 12). If the chart contains B from position b to m and C from position m+1 to e (line 13), the function returns true (line 14), indicating thata6a15a14 through a6 a8 are reducible to the non-terminal A. Otherwise, the function returns false (line 15).</Paragraph>
    <Paragraph position="3"> In order to deal with chain rules, the parser precomputes for each category C the set of non-terminals D which are derivable from C by a sequence of chain rule reductions, i.e. for which D a16a17 C holds, and stores them in the bit vector chainvec[C]. The set includes C itself. Given the grammar rules NP a0 DT N1, NPa0 N1, N1a0 JJ N1 and N1a0 N, the bits for NP, N1 and N are set in chainvec[N]. When a new constituent of category A starting at position b and ending at position e has been recognised, all the constituents reachable from A by means of chain rules are simultaneously added to the chart by or-ing the precomputed bit vector chainvec[A] to chart[b][e] (see lines 5 and 9 in Fig. 1).</Paragraph>
    <Paragraph position="4"> The rst parsing step is a pure recogniser which computes the set of constituents to which the input words can be reduced, but not their analyses.</Paragraph>
    <Paragraph position="5"> Therefore it is not necessary to look for further analyses once the rst analysis of a constituent has been found. The function derivabletherefore returns as soon as the rst analysis is nished (line 13 and 14), and derivableis not called if the respective constituent was previously derived by chain rules (line 8).</Paragraph>
    <Paragraph position="6"> Because only one analysis has to be found and some rules are more likely than others, the algorithm is optimised by trying the different rules for each category in order of decreasing frequency (line 11).</Paragraph>
    <Paragraph position="7"> The frequency information is collected online during parsing.</Paragraph>
    <Paragraph position="8"> Derivation of constituents by means of chain rules is much cheaper than derivation via binary rules.</Paragraph>
    <Paragraph position="9"> Therefore the categories in line 7 are ordered such that categories from which many other categories are derivable through chain rules, come rst.</Paragraph>
    <Paragraph position="10"> The chart is actually implemented as a single large bit-vector with access functions translating index triples (start position, end position, and symbol number) to vector positions. The bits in the chart are ordered such that chart[b][e][n+1] follows after chart[b][e][n], allowing the ef cient insertion of a set of bits with an or-operation on bit vectors.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Using Bit-Vector Operations
</SectionTitle>
    <Paragraph position="0"> The function derivable is the most time-consuming part of the recogniser, because it is the only part whose overall runtime grows cubically with sentence length. The inner loop of the function iterates over the possible end positions of the rst child constituent and computes an and-operation for each position. This loop can be replaced by a single and-operation on two bit vectors, where the rst bit vector contains the bits stored in chart[b][b][B], chart[b][b+1][B] ... chart[b][e-1][B] and the second bit vector contains the bits stored in chart[b+1][e][C], chart[b+2][e][C] ... chart[e][e][C].</Paragraph>
    <Paragraph position="1"> The bit-vector operation is overall more ef cient than the solution shown in Figure 1 if the extraction of the two bit vectors from the chart is fast enough. If the bits in the chart are ordered such that chart[b][1][A] ... chart[b][N][A] are in sequence, the rst bit vector can be ef ciently extracted by block-wise copying. The same holds for the second bit vector if the bits are ordered such that chart[1][e][A] ... chart[n][e][A] are in sequence.</Paragraph>
    <Paragraph position="2"> Therefore, the chart of the parser which uses bit-vector operations, internally consists of two bit vectors. New bits are inserted in both vectors.</Paragraph>
    <Paragraph position="3">  Due to the new representation of the chart, the insertion of bits into the chart by means of the operation chart[b][e] a5 chart[b][e]  |vec cannot be done with bit vector operations, anymore. Instead, each 1-bit of the bit vector has to be set separately in both copies of the chart. Binary search is used to extract the 1-bits from each machine word of a bit vector. This is more ef cient than checking all bits sequentially if the number of 1-bits is small. Figure 3 shows how the 1-bits would be extracted from a 4-bit word v and stored in the set s. The rst line checks whether any bit is set in v. If so, the second line checks whether one of the rst two bits is set.</Paragraph>
    <Paragraph position="4"> If so, the third line checks whether the rst bit is 1 and, if true, adds 0 to s. Then it checks whether the second bit is 1 and so on.</Paragraph>
    <Paragraph position="6"/>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Parse Forest Generation
</SectionTitle>
    <Paragraph position="0"> The chart only provides information about the constituents, but not about their analyses. In order to generate a parse forest representation of the set of all analyses, the chart is traversed top-down, reparsing all the constituents in the chart which are part of a complete analysis of the input sentence. The parse forest is stored by means of six arrays named  catname,catnum,first-analysis,rulenumber, first-child, and child. catnum[n] contains the number of the category of the nth constituent. first-analysis[n] is the index of the rst analysis of the nth constituent, and first-analysis[n+1]-1 is the index of the last analysis. rule-number[a] returns the rule number of analysis a, and firstchild[a] contains the index of its rst child node number in the child array. The numbers of the other child nodes are stored at the following positions. child[d] is normally the number of the node which forms child d. However, if the child with number d is the input word a6 a8 , the value of child[d] is a12a9a31a27a12a32a26 instead. A negative value in the child array therefore indicates a terminal node and allows decoding of the position of the respective word in the sentence. catname[catnum[child[firstchild[first-analysis[n]]]]] is therefore the name of the category of the rst child of the rst analysis of the nth constituent. The rule-number array is not needed to represent the structure of the parse forest, but speeds up the retrieval of rule probabilities and similar information.</Paragraph>
    <Paragraph position="1"> The parse forest shown in Figure 4 is represented by</Paragraph>
    <Paragraph position="3"> The parse forest is built by the function parse shown in Figure 5. The function newnode(b,e,A) adds the number of A at the end of the catnum array. It also adds the currently biggest index of the first-childarray plus 1 to the first-analysisarray. It returns the largest index of the catnum array as node number. newnode also stores a mapping from the triple a1b,e,Aa2 to the respective node number n in a hash table.</Paragraph>
    <Paragraph position="4"> The hash table is used by get-node(b,e,A) to checks whether a constituent has already been added to the parse forest and, if true, returns its number. add-analysis(n,r,m) increments the size of the child array by 2 and adds the index of the rst new element to the first-child array. It further adds the number of rule r to the rule-number array and stores the pair a1r,ma2 in a temporary array which is later accessed in lines 17, 19, and 22. add-analysis(n,r) is similar, but adds just one element to the child array. Finally, the function add-child inserts the child node indices returned by recursive calls of buildsubtree. The optimisation with bit-vector operations described in section 4 is also applicable in lines 14 and 15.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Viterbi Parsing
</SectionTitle>
    <Paragraph position="0"> Viterbi parses for probabilistic context-free grammars (PCFGs) could be extracted from context-free  parse forests, but BitPar computes them without building the parse forest in order to save space. After building the recogniser chart, the Viterbi version of BitPar lters the chart as shown in Figure 6 in order to eliminate constituents which are not part of a complete analysis.</Paragraph>
    <Paragraph position="1"> After ltering the chart, the Viterbi probabilities of the remaining constituents are computed by the algorithm in gure 7. p[b][e][A] is implemented with a hash table. The value of prob(r) is 1 if the left-hand side of r is an auxiliary symbol inserted during the grammar transformation and otherwise the probability of the corresponding PCFG rule.</Paragraph>
    <Paragraph position="2"> Finally, the algorithm of gure 8 prints the Viterbi parse.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
7 Discussion
</SectionTitle>
    <Paragraph position="0"> BitPar was developed for the generation of parse forests with large treebank grammars. It saves memory by splitting parsing into two steps, (1) the gen- null eration of a recogniser chart which is compactly stored in a bit-vector, and (2) the generation of the parse forest. Parse forest nodes are only created for constituents which are part of a complete analyses, whereas standard 1-pass chart parsers create more nodes which are later abandoned.</Paragraph>
    <Paragraph position="1"> Viterbi parsing involves four steps. About 15 % of the parse time is needed for building the chart, 28 % for ltering, and 57 % for the computation of the Viterbi probabilities. The time required for the extraction of the best parse is negligible (0.04 %). The Viterbi step spends about 80 % of the time (45 % of the total time) on the computation of the probabilities and only about 20 % on the computation of the possible analyses. So, although Viterbi probabilities are only computed for nodes which are part of a valid analysis, it still takes almost half of the time to compute them, and the proportion increases with sentence length.</Paragraph>
    <Paragraph position="2"> In contrast to most beam search parsing strategies, BitPar is guaranteed to return the most probable analysis, and there is no need to optimise any scoring functions or parameters.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML