File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-2033_metho.xml
Size: 7,259 bytes
Last Modified: 2025-10-06 14:10:12
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2033"> <Title>Parser Combination by Reparsing</Title> <Section position="3" start_page="0" end_page="129" type="metho"> <SectionTitle> 2 Dependency Reparsing </SectionTitle> <Paragraph position="0"> In dependency reparsing we focus on unlabeled dependencies, as described by Eisner (1996). In this scheme, the syntactic structure for a sentence with n words is a dependency tree representing head-dependent relations between pairs of words.</Paragraph> <Paragraph position="1"> When m parsers each output a set of dependencies (forming m dependency structures) for a given sentence containing n words, the dependencies can be combined in a simple word-by-word voting scheme, where each parser votes for the head of each of the n words in the sentence, and the head with most votes is assigned to each word. This very simple scheme guarantees that the final set of dependencies will have as many votes as possible, but it does not guarantee that the final voted set of dependencies will be a well-formed dependency tree. In fact, the resulting graph may not even be connected. Zeman & Zabokrtsky (2005) apply this dependency voting scheme to Czech with very strong results. However, when the constraint that structures must be well-formed is enforced, the accuracy of their results drops sharply.</Paragraph> <Paragraph position="2"> Instead, if we reparse the sentence based on the output of the m parsers, we can maximize the number of votes for a well-formed dependency structure. Once we have obtained the m initial dependency structures to be combined, the first step is to build a graph where each word in the sentence is a node. We then create weighted directed edges between the nodes corresponding to words for which dependencies are obtained from each of the initial structures.1 In cases where more than one dependency structure indicates that an edge should be created, the corresponding weights are simply added. As long as at least one of the m initial structures is a well-formed dependency structure, the directed graph created this way will be connected.</Paragraph> <Paragraph position="3"> Once this graph is created, we reparse the sentence using a dependency parsing algorithm such as, for example, one of the algorithms described by McDonald et al. (2005). Finding the optimal dependency structure given the set of weighted dependencies is simply a matter of finding the maximum spanning tree (MST) for the directed weighted graph, which can be done using the Chu-Liu/Edmonds directed MST algorithm (Chu & Liu, 1965; Edmonds, 1967). The maximum spanning tree maximizes the votes for dependencies given the constraint that the resulting structure must be a tree. If projectivity (no crossing branches) is desired, Eisner's (1996) dynamic programming algorithm (similar to CYK) for dependency parsing can be used instead.</Paragraph> </Section> <Section position="4" start_page="129" end_page="129" type="metho"> <SectionTitle> 3 Constituent Reparsing </SectionTitle> <Paragraph position="0"> In constituent reparsing we deal with labeled constituent trees, or phrase structure trees, such as those in the Penn Treebank (after removing traces, empty nodes and function tags). The general idea is the same as with dependencies. First, m parsers each produce one parse tree for an input sentence.</Paragraph> <Paragraph position="1"> We then use these m initial parse trees to guide the application of a parse algorithm to the input.</Paragraph> <Paragraph position="2"> Instead of building a graph out of words (nodes) and dependencies (edges), in constituent reparsing we use the m initial trees to build a weighted parse chart. We start by decomposing each tree into its constituents, with each constituent being a 4-tuple [label, begin, end, weight], where label is the phrase structure type, such as NP or VP, begin is the index of the word where the constituent starts, end is the index of the word where the constituent ends plus one, and weight is the weight of the constituent. As with dependencies, in the simplest case the weight of each constituent is simply 1.0, but different weighting schemes can be used.</Paragraph> <Paragraph position="3"> Once the initial trees have been broken down into constituents, we put all the constituents from all of the m trees into a single list. We then look for each pair of constituents A and B where the label, begin, and end are identical, and merge A and B into a single constituent with the same label, begin, and end, and with weight equal to the weight of A plus the weight of B. Once no more constituent mergers are possible, the resulting constituents are placed on a standard parse chart, but where the constituents in the chart do not contain back-pointers indicating what smaller constituents they contain.</Paragraph> <Paragraph position="4"> Building the final tree amounts to determining these back-pointers. This can be done by running a bottom-up chart parsing algorithm (Allen, 1995) for a weighted grammar, but instead of using a grammar to determine what constituents can be built and what their weights are, we simply constrain the building of constituents to what is already in the chart (adding the weights of constituents when they are combined). This way, we perform an exhaustive search for the tree that represents the heaviest combination of constituents that spans the entire sentence as a well-formed tree.</Paragraph> <Paragraph position="5"> A problem with simply considering all constituents and picking the heaviest tree is that this favors recall over precision. Balancing precision and recall is accomplished by discarding every constituent with weight below a threshold t before the search for the final parse tree starts. In the simple case where each constituent starts out with weight 1.0 (before any merging), this means that a constituent is only considered for inclusion in the final parse tree if it appears in at least t of the m initial parse trees. Intuitively, this should increase precision, since we expect that a constituent that appears in the output of more parsers to be more likely to be correct. By changing the threshold t we can control the precision/recall tradeoff.</Paragraph> <Paragraph position="6"> Henderson and Brill (1999) proposed two parser combination schemes, one that picks an entire tree from one of the parsers, and one that, like ours, builds a new tree from constituents from the initial trees. The latter scheme performed better, producing remarkable results despite its simplicity. The combination is done with a simple majority vote of whether or not constituents should appear in the combined tree. In other words, if a constituent appears at least (m + 1)/2 times in the output of the m parsers, the constituent is added to the final tree.</Paragraph> <Paragraph position="7"> This simple vote resulted in trees with f-score significantly higher than the one of the best parser in the combination. However, the scheme heavily favors precision over recall. Their results on WSJ section 23 were 92.1 precision and 89.2 recall (90.61 f-score), well above the most accurate parser in their experiments (88.6 f-score).</Paragraph> </Section> class="xml-element"></Paper>