XML Viewer - w05-1517

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-1517_metho.xml
Size: 25,362 bytes
Last Modified: 2025-10-06 14:09:58
<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1517">
  <Title>Efficient extraction of grammatical relations</Title>
  <Section position="4" start_page="160" end_page="163" type="metho">
    <SectionTitle>
2 The RASP System
</SectionTitle>
    <Paragraph position="0"> RASP is based on a pipelined modular architecture in which text is pre-processed by a series of components including sentence boundary detection, tokenisation, part of speech tagging, named entity recognition and morphological analysis, before being passed to a statistical parser1. A brief overview of relevant aspects of syntactic processing in RASP is given below; for full details of system components, see Briscoe and Carroll (1995; 2002; 2005)2.</Paragraph>
    <Paragraph position="1"> 1Processing times given in this paper do not include these pre-processing stages, since they take negligible time compared with parsing.</Paragraph>
    <Section position="1" start_page="160" end_page="160" type="sub_section">
      <SectionTitle>
2.1 The Grammar
</SectionTitle>
      <Paragraph position="0"> Briscoe and Carroll (2005) describe the (manuallywritten) feature-based unification grammar and the rule-to-rule mapping from local trees to GRs. The mapping specifies for each grammar rule the semantic head(s) of the rule (henceforth, head), and one or more GRs that should be output (optionally depending on feature values instantiated at parse time). For example, Figure 1 shows a grammar rule analysing a verb phrase followed by a prepositional phrase modifier. The rule identifies the first daughter (1) as the semantic head, and specifies that one of five possible GRs is to be output, depending on the value of the PSUBCAT syntactic feature; so, for example, if the feature has the value NP, then the relation is ncmod (non-clausal modifier), with slots filled by the semantic heads of the first and second daughters (the 1 and 2 arguments).</Paragraph>
      <Paragraph position="1"> Before parsing, a context free backbone is derived automatically from the grammar, and an LALR(1) parse table is computed from this backbone (Carroll, 1993, describes the procedure in detail). Probabilities are associated with actions in the parse table, by training on around 4K sentences from the Susanne corpus (Sampson, 1995), each sentence having been semi-automatically converted from a tree-bank bracketing to a tree conforming to the unification grammar (Briscoe and Carroll, 1995).</Paragraph>
    </Section>
    <Section position="2" start_page="160" end_page="161" type="sub_section">
      <SectionTitle>
2.2 The Parse Forest
</SectionTitle>
      <Paragraph position="0"> When parsing, the LALR table action probabilities are used to assign a score to each newly derived (sub-)analysis. Additionally, on each reduce action (i.e. complete application of a rule), the rule's daughters are unified with the sequence of sub-analyses being consumed. If unification fails then the reduce action is aborted. Local ambiguity packing (packing, henceforth) is performed on the basis of feature structure subsumption. Thus, the parser builds and returns a compact structure that efficiently represents all parses licensed by the grammar: the parse forest. Since unification often fails it is not possible to apply beam or best first search strategies during construction of the parse forest; statistically high scoring paths often end up in unification failure. Hence, the parse forest represents all parses licensed by the grammar.</Paragraph>
      <Paragraph position="2"> identification of daughter 1 as the semantic head (second line), and possible GR outputs depending on the parse-time value of the PSUBCAT feature of daughter 2 (subsequent lines).</Paragraph>
      <Paragraph position="3"> Figure 2 shows a simplified parse forest containing three parses generated for the following pre-processed text3: I PPIS1 see+ed VVD the AT man NN1 in II the AT park NN1 The GR specifications shown are instantiated based on the values of syntactic features at daughter nodes, as discussed in Section 2.1 above. For example, the V1/vp pp sub-analysis (towards the left hand side of the Figure) contains the instantiated GR specification a0 1, (ncmod 1 2) a1 since its second daughter has the value NP for its PSUBCAT feature.</Paragraph>
      <Paragraph position="4"> Henceforth, we will use the term 'node' to refer to data structures in our parse forest corresponding to a rule instantiation: a sub-analysis resulting from application of a reduce action. Back pointers are stored in nodes, indicating which daughters were used to create the sub-analysis. These pointers provide a means to traverse the parse forest during subsequent processing stages. A 'packed node' is a node representing a sub-analysis that is subsumed by, and hence packed into, another node. Packing is considered for nodes only if they are produced in the same LR state and represent sub-analyses with the same word span. A parse forest can have a number of root nodes, each one dominating analyses spanning the whole sentence with the specified top category.</Paragraph>
    </Section>
    <Section position="3" start_page="161" end_page="163" type="sub_section">
      <SectionTitle>
2.3 Parser Output
</SectionTitle>
      <Paragraph position="0"> From the parse forest, RASP unpacks the 'n-best'4 syntactic trees using a depth-first beam search (Carroll, 1993). There are a number of types of analysis  cal relations (GRs) and robust minimal recursion semantics (RMRS). Each of these is computed from the n-best trees.</Paragraph>
      <Paragraph position="1"> Another output possibility is weighted GRs (Carroll and Briscoe, 2002); this is the unique set of GRs from the n-best GRs, each GR weighted according to the sum of the probabilities of the parses in which it occurs. Therefore, a number of processing stages determine this output: unpacking the n-best syntactic trees, determining the corresponding n-best GR sets and finding the unique set of GRs and corresponding weights.</Paragraph>
      <Paragraph position="2"> The GRs for each parse are computed from the set of GR specifications at each node, passing the (semantic) head of each sub-analysis up to the next higher level in the parse tree (beginning from word nodes). GR specifications for nodes (which, if required, have been instantiated based on the features of daughter nodes) are referred to as 'unfilled' until the slots containing numbers are 'filled' with the corresponding heads of daughter nodes. For example, the grammar rule named NP/det n has the unfilled GR specification a0 2, (det 2 1) a1 . Therefore, if an NP/det n local tree has two daughters with heads the and cat respectively, the resulting filled GR specification will be a0 cat, (det cat the) a1 , i.e. the head of the local tree is cat and the GR output is (det cat the). Figure 3 illustrates the n-best GRs and the corresponding (non-normalised and normalised) weighted GRs for the sentence I saw the man in the park. The corresponding parse forest for this example is shown in Figure 2. Weights on the GRs are normalised probabilities representing the weighted proportion of parses in which the GR occurs. This weighting is in practice calculated as the sum of parse probabilities for parses con- null taining the specific GR, normalised by the sum of all parse probabilities. For example, the GR (iobj see+ed in) is in one parse with probability</Paragraph>
      <Paragraph position="4"> all parse probabilities is a0 a1a3a2 a4a7a6 a1 a6 a6 a2a3a10 a6 . Therefore, the normalised probability (and final weight) of the</Paragraph>
      <Paragraph position="6"/>
    </Section>
  </Section>
  <Section position="5" start_page="163" end_page="163" type="metho">
    <SectionTitle>
3 Data and Methods
</SectionTitle>
    <Paragraph position="0"> King et al. (2003) outline the development of the PARC 700 Dependency Bank (henceforth, Dep-Bank), a gold-standard set of relational dependencies for 700 sentences (originally from the Wall Street Journal) drawn at random from Section 23 of the Penn Treebank. Briscoe and Carroll (2005) extended DepBank with a set of gold-standard RASP GRs that we use to measure parser accuracy.</Paragraph>
    <Paragraph position="1"> We use the same 560 sentence subset from the DepBank utilised by Kaplan et al. (2004) in their study of parser accuracy and efficiency. All experimental results are obtained using this test suite on an AMD Opteron 2.5GHz CPU with 1GB of Ram on a 64 bit version of Linux. The parser's output is evaluated using a relational dependency evaluation scheme (Carroll et al., 1998; Lin, 1998) and standard evaluation measures: precision, recall and Fa23 .</Paragraph>
  </Section>
  <Section position="6" start_page="163" end_page="163" type="metho">
    <SectionTitle>
4 Local Ambiguity Packing
</SectionTitle>
    <Paragraph position="0"> Oepen and Carroll (2000) note that when using subsumption-based packing with a unification-based grammar, the parse forest may implicitly represent some parses that are not actually licensed by the grammar; these will have values for one or more features that are locally but not globally consistent.</Paragraph>
    <Paragraph position="1"> This is not a problem when computing GRs from trees that have already been unpacked, since the relevant unifications will have been checked during the unpacking process, and will have caused the affected trees to be filtered out. Unification fails for at least one packed tree in approximately 10% of the sentences in the test suite. However, such inconsistent 5As we are dealing with log probabilities, summation and subtraction of these probabilities is not straightforward. Multiplication of probabilities X and Y, with log probabilities x and y respectively is determined using the formula a35a37a36a39a38a41a40  trees are a problem for any approach to probability computation over the parse forest that is based on the Inside-Outside algorithm (IOA). For our efficient weighted GR extraction technique we therefore modify the parsing algorithm so that packing is based on feature structure equality rather than subsumption. null Oepen and Carroll give definitions and implementation details for subsumption and equality operations, which we adopt. In the experiments below, we refer to versions of the parser with subsumption and equality based packing as SUB-PACKING and</Paragraph>
  </Section>
  <Section position="7" start_page="163" end_page="168" type="metho">
    <SectionTitle>
EQ-PACKING respectively.
5 Extracting Weighted GRs
</SectionTitle>
    <Paragraph position="0"> Parse forest unpacking consumes larger amounts of CPU time and memory as the number of parses to unpack (n-best) increases. Carroll and Briscoe (2002) demonstrate that increasing the size of the n-best list increases the upper bound on precision (i.e. when low-weighted GRs are filtered out). Therefore, if practicable, it is preferable to include all possible parses when calculating weighted GRs. We describe below a dynamic programming approach (EWG) based on the IOA to efficiently extract weighted GRs directly from the parse forest. EWG calculates weighted GRs over all parses represented in the parse forest.</Paragraph>
    <Paragraph position="1"> Inside and outside probabilities are analogous to the forward and backward probabilities of markov model algorithms. The inside probability represents the probability of all possible sub-analyses of a node. Conversely, the outside probability represents the probability of all analyses for which the node is a sub-analysis.</Paragraph>
    <Paragraph position="2"> The IOA is ideal for our task, as the product of inside and outside probabilities for a sub-analysis constitutes part of the sum for the non-normalised weight of each GR (arising from the GR specification in the sub-analysis). Further, we can apply the sum of inside probabilities for each root-node, to normalise the weighted GRs.</Paragraph>
    <Section position="1" start_page="163" end_page="165" type="sub_section">
      <SectionTitle>
5.1 Implementation
</SectionTitle>
      <Paragraph position="0"> Three processing stages are required to determine weighted GRs over the parse forest, calculating  (1) filled GRs and corresponding inside probabili null the sentence I saw the man in the park. Parse probabilities and non-normalised weights are shown as log probabilities. Weights and parse probabilities are shown with differing precision, however RASP stores all probabilities in log (base 10) form with double float precision.  ties, (2) outside (and non-normalised) probabilities of weighted GRs, and (3) normalised probabilities of weighted GRs.6 The first two processing stages are covered in detail in the following sections, while the final stage simply entails normalising the probabilities by dividing each weight by the sum of all the parse probabilities (the sum of root-nodes' inside probabilities).</Paragraph>
      <Paragraph position="1">  To determine inside probabilities over the nodes in the parse forest, we need to propagate the head and corresponding inside probability upwards after filling the node's GR specification. The inside probability of node a0 is usually calculated over the parse forest by multiplying the inside probability of the node's daughters and the probability a1 a2 a0a4a3 of the node itself (i.e. the probability of the shift or reduce action that caused the node to be created). Therefore, if a node has daughters a5 a23 and a5 a14 , then the inside probability a6a8a7 is calculated using:</Paragraph>
      <Paragraph position="3"> However, packed nodes each correspond to an alternative filled GR specification. Inside probabilities for these GR specifications need to be combined. If packed analyses a0a16a15 occur in node a0 then the inside probability of node a0 is:</Paragraph>
      <Paragraph position="5"> Further, the alternative GR specifications may not necessarily specify the same head as the node's GR specification and multiple heads may be passed up by the node. Hence, the summation in equation 2 needs to be conditioned on the possible heads of a</Paragraph>
      <Paragraph position="7"> When multiple heads are passed up by daughter nodes, multiple filled GR specifications are found for the node. We create one filled GR specification for 6Note that the IOA is not applied iteratively; a single pass only is required.</Paragraph>
      <Paragraph position="8"> each possible combination of daughters' heads7. For example, consider the case where a node has daughters a5 a23 and a5 a14 with semantic heads a39 dog, cata40 and a39 ana40 respectively. Here, we need to fill the GR specification a0 2, (det 2 1) a1 with two sets of daughters' heads: a0 dog, (det dog an) a1 and a0 cat, (det cat an) a1 . As a node can have multiple filled GR specifications a41a43a42 a2 a0a27a26a44a28a45a30a31a3 , we alter equation 3 to:</Paragraph>
      <Paragraph position="10"> Here, a6 a47 (the inside probability of filled GR specification a52 ) is determined by multiplying the inside probabilities of daughters' heads (that filled the GR specification) and the reduce probability of the node itself, i.e. using a modification of equation 1. Returning to the previous example, the inside probabilities of a0 dog, (det dog an) a1 and a0 cat, (det cat an) a1 will be equal to the reduce probability of the node multiplied by (a) the inside probability of head an, and (b) the inside probabilities of the heads dog and cat, respectively.</Paragraph>
      <Paragraph position="11"> Hence, (a) calculation of inside probabilities takes into account multiple semantic heads, and (b) GR specifications are filled using every possible combination of daughters' heads. Each node a0 is processed in full as follows:</Paragraph>
    </Section>
    <Section position="2" start_page="165" end_page="166" type="sub_section">
      <SectionTitle>
a53 Process each of the node's packed nodes
</SectionTitle>
      <Paragraph position="0"> a0 a15 to determine the packed node's list of filled GR specifications and corresponding inside probabilities. null  a list of possible semantic heads and corresponding inside probabilities for each. - Fill the GR specification of a0 with each possible combination of daughters' heads. 7The same word can appear as a head for more than one daughter of a node. This occurs if competing analyses have daughters with different word spans and, therefore, particular words can be considered in the span of either daughter. As the grammar permits both pre- and post- modifiers, it is possible for words in the 'overlapping' span to be passed up as heads for both daughters. Therefore, semantic heads are not combined unless they are different words.</Paragraph>
      <Paragraph position="1">  Calculate the inside probability of each filled GR specification.</Paragraph>
      <Paragraph position="2"> a53 Combine the alternative filled GR specifications of a0 and a0 a15 , determining the list of unique semantic heads and corresponding inside probabilities using equation 4.</Paragraph>
      <Paragraph position="3"> For each node, we propagate up a set of data structures a39 a1  a40 that each contain one possible head a34 and corresponding inside probability. At word nodes, we simply return the word and the reduce score of the word as the semantic head and inside probability, respectively. Back pointers are also included to store the list of alternative filled GR specifications and corresponding inside probabilities, the reduce score for the node and the daughters' data structures (used to fill the GR specifications).</Paragraph>
      <Paragraph position="4">  After the inside probabilities have been computed (bottom-up) the resulting data structure at the root-node is traversed to compute outside probabilities. The data structure created is split into alternative semantic heads for each node and, therefore, traversal to determine outside probabilities is relatively trivial: the outside probability of a filled GR specification is equal to the outside probability of the corresponding unique head of the node. Therefore, once we have created the new data structure, outside probabilities for each node can be determined over this structure in the regular fashion.</Paragraph>
      <Paragraph position="5"> We calculate the outside probabilities (top-down) and, when we find filled GR specifications, we incrementally store the non-normalised weight of each  ity for a52 (in a hash table).</Paragraph>
      <Paragraph position="6"> - Process the data structure for each child head in a52 , a6 a2 a52a45a3 . That is, the daughters' heads that filled the GR specification (resulting in a52 ). For each a7a8a3a9a6 a2 a52a45a3 :</Paragraph>
    </Section>
    <Section position="3" start_page="166" end_page="168" type="sub_section">
      <SectionTitle>
6.1 Efficiency and Accuracy
</SectionTitle>
      <Paragraph position="0"> The dynamic programming algorithm outlined in Section 5, EWG, provides an efficient and accurate method of determining weighted GRs directly from the parse forest. Figures 5 and 6 compare the efficiency of EWG to the EQ-PACKING and SUB-PACKING methods in terms of CPU time and memory, respectively9. Note that EWG applies equality-based packing to ensure only parses licensed by the grammar are considered (see Section 4).</Paragraph>
      <Paragraph position="1"> As the maximum number of (n-best) parses increases, EQ-PACKING requires more time and memory than SUB-PACKING. However, if we compare these systems with an n-best value of 1, the difference in time and memory is negligible, suggesting that it is the unpacking stage which is responsible for the decreased throughput. For EWG we are forced to use equality-based packing, but these results suggest that using equality is not hurting the throughput of EWG.</Paragraph>
      <Paragraph position="2"> Both figures illustrate that the time and memory required by EWG are static because the algorithm considers all parses represented in the parse forest regardless of the value of n-best specified. Therefore, the 'cross-over points' are of particular interest: at which n-best value is EWG's efficiency the same as that of the current system? This value is 8We apply a breadth first search (FIFO queue) to minimise multiple processing of shared data structures. If an outside probability is determined for a data structure already queued, then the probability is appended to the queued item. The steps are modified to enable multiple outside probabilities, i.e. summation over each outside probability when calculating a19a17a20 and  approximately 580 and 100 for time and memory, respectively (comparing EWG to EQ-PACKING).</Paragraph>
      <Paragraph position="3"> Given that there are on average around 9000 parses per sentence in the test suite, these results indicate a substantial improvement in both efficiency and accuracy for weighted GR calculation. However, the median number of parses per sentence is around 50, suggesting that large parse numbers for a small sub-set of the test suite are skewing the arithmetic mean. Therefore, the complexity of this subset will significantly decrease throughput and EWG will improve efficiency for these sentences more so than for others. null The general relationship between sentence length and number of parses suggests that the EWG will be more beneficial for longer sentences. Figure 4 shows the distribution of number of parses over sentence length. The figure illustrates that the number of parses can not be reliably predicted from sentence length. Considering the cross-over points for time and memory, the number of sentences with more than 580 and 100 parses were 216 and 276, respectively. Thus, the EWG out-performs the current algorithm for around half of the sentences in the data set. The relative gain achieved reflects that a sub-set of sentences can significantly decrease throughput. Hence, the EWG is expected to improve the efficiency if a) longer sentences are present in the data set and b) n-best is set to a value greater than the cross-over point(s).</Paragraph>
      <Paragraph position="4"> Upper bounds on precision and recall can be determined using weight thresholds over the GRs of 1.0 and 0.0, respectively10. Upper bounds of precision and recall provided by EWG are 79.57 and 82.02, respectively, giving an Fa23 upper bound of 81.22%. However, considering the top 100 parses only, we achieve upper bounds on precision and recall of 78.77% and 81.18% respectively, resulting in an Fa23 upper bound of 79.96%. Therefore, using EWG, we are able to achieve a relative increase of 6.29% for the Fa23 upper bound on the task. Similarly, Carroll and Briscoe (2002) demonstrate (on an earlier, different test suite) that increasing the number of parses (n-best) from 100 to 1000 increases precision of weighted GR sets from 89.59% to 90.24%, 10In fact, in these experiments we use a threshold of a65 a51a1a0 (with a0 a40 a69a3a2a69a19a69a82a69a71a65 ) instead of a threshold of a65a4a2a69 to reduce the influence of very low ranked parses.</Paragraph>
      <Paragraph position="5">  by the different versions of the parsing system for calculation of weighted GRs over the n-best parses.</Paragraph>
      <Paragraph position="6">  the different versions of the system for calculation of weighted GRs over the n-best parses.</Paragraph>
      <Paragraph position="7"> a relative error reduction (RER) of 6.8%. Therefore, EWG achieves a substantial improvement in both efficiency and accuracy for weighted GR calculation; providing increased precision for thresholded GR sets and an increased Fa23 upper bound on the task.</Paragraph>
    </Section>
    <Section position="4" start_page="168" end_page="168" type="sub_section">
      <SectionTitle>
6.2 Parse Selection
</SectionTitle>
      <Paragraph position="0"> Section 6.1 illustrated the increased level of efficiency achieved by EWG compared to the current system's method for calculating weighted GRs. This section briefly considers a parse selection algorithm using EWG that would otherwise be too inefficient to apply.</Paragraph>
      <Paragraph position="1"> Clark and Curran (2004) determine weighted GRs directly from a packed chart using Miyao and Tsujii's (2002) dynamic programming algorithm. They outline a parse selection algorithm which maximises the expected recall of dependencies by selecting the n-best GR set with the highest average GR score based on the weights from the weighted GRs. We can apply this parse selection algorithm in two ways: either (a) re-rank the n-best GR sets based on the average weight of GRs and select the highest ranking set, or (b) apply a simple variant of the Viterbi algorithm to select the GR set with the highest average weighted score over the data structure built during EWG. The latter approach, based on the parse selection algorithm in Clark and Curran (2004), takes into account all possible parses and effectively re-ranks all parses using weights output by EWG. These approaches will be referred to as RE-RANK (over the top 1000 parses) and BEST-AVG, respectively.</Paragraph>
      <Paragraph position="2"> The GR set corresponding to the system's top parse achieves an Fa23 of 71.24%. By applying BEST-AVG and RE-RANK parse selection, we achieve a relative error reduction of 3.01% and 0.90%, respectively. Therefore, BEST-AVG achieves higher accuracy and is more efficient than RE-RANK. It is also worth noting that these parse selection schemes are able to output a consistent set of GRs unlike the set corresponding to high precision GR output.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML