File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1715_metho.xml
Size: 15,777 bytes
Last Modified: 2025-10-06 14:08:36
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1715"> <Title>Abductive Explanation-based Learning Improves Parsing Accuracy and Efficiency</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 A Formal View on Parsing and Learning </SectionTitle> <Paragraph position="0"> We use the following notation throughout the paper: a0a2a1 a3a5a4a7a6a9a8a11a10 (function a0 applied to a4 yields x), a0a13a12a1a14a3a5a4a7a6a15a8a16a10 (relation a0 applied to a4 yields x). a17a19a18a20a18a20a18a22a21 and a18a20a18a20a18a25a24 represent tuples and sets respectively. The a26 prefix denotes the cardinality of a collection, e.g. a26a27a23a29a28a31a30a33a32a34a28a36a35 a24 a8a38a37 . Uppercase variables stand for collections and lowercase variables for elements. Collections may contain the anonymous variable a39 (the variable _ in PROLOG). Over-braces or under-braces should facilitate reading: a40a42a41a43a40</Paragraph> <Paragraph position="2"> A theory a52 is a17a54a53 a32a56a55a57a32a59a58 a21 where a58 is a set of rules a60 . a53 and a55 are two disjoint sets of attributes</Paragraph> <Paragraph position="4"> tion between an observable fact a28 and an attribute a61 assigned to it. a86 is the set of observable data with each a28a84a87a88a86 being a tuple a28 a8 a17 a61 a32a59a62 a21 .3 a89 is the set of data classified according to</Paragraph> <Paragraph position="6"> a61 may have an internal structure in the form of ordered or unordered collections of more elementary a28 , a62 and a61 respectively.</Paragraph> <Paragraph position="7"> Transferring this notation to the description of parsing, a52 is a syntactic formalism and a58 a grammar. a53 is the union of syntax trees and morpho-syntactic tags. a86 is a corpus tagged with a53 . a55 corresponds to a list of words, phrases or sentences (the surface strings). a89 is a treebank, a cache of parse trees, or a history of explanations.</Paragraph> <Paragraph position="9"> A parser defines a relation between a86 and a89 (c.f.</Paragraph> <Paragraph position="10"> 2). Parsing is a relation between a28 and a subset of a89 (c.f. 3).</Paragraph> <Paragraph position="12"/> <Paragraph position="14"> Simplifying, we can assume that a107 is defined as the set of rules, i.e. a107 a8 a17 a86a114a32 a89 a21 a8 a58 . A specific parser a107 is derived by the application of a115 to the training material (e.g. a89 ): a115 a12a1a116a3 a89 a6a117a8 a107 . The set of possible relations a115 is a118 . Elements of a118 are caching (no generalization), induction (hypothesis after data inspection) and abduction (hypothesis during classification). Equation (5) describes the cycle of grammar learning and grammar application.</Paragraph> <Paragraph position="16"> a124 in (6) is the trivial formalization of caching. Parsing proceeds via recalling a125 defined in (7). The cycle of grammar learning and parsing</Paragraph> <Paragraph position="18"> Let a132a106a71a33a81a56a71a33a133a92a71 be a function which replaces one or more elements of a collection by a named variable or a39 . a107 is a deductive inference if a60 is obtained from an induction (a reduction of a28 with the help of a132a106a71a29a81a56a71a47a133a59a71 ). The following expressions define induction a134 (9), deduction a135 (10) and the inductive-deductive cycle a135 a12a1a108a3a131a134a82a6 (11): 4We use subscripts to indicate the identity of variables. The same subscript of two variables implies the identity of both variables. Different subscripts imply nothing. The variables may be identical or not identical. In memory-based parsing, learning material and parsing output are identical.</Paragraph> <Paragraph position="20"> Abduction, defined as</Paragraph> <Paragraph position="22"> time generalization which is triggered by a concrete a28 to be classified. We separate a151 and a154 for presentation purpose only.5 The relation a155 may express a similarity, a temporal or causal relation. (12) and the cycle of a154 a12a1a108a3a5a151a117a6 (13) define abduction.</Paragraph> <Paragraph position="24"> Abduction subsumes reasoning by analogy. Abduction is an analogy, if a155 describes a similarity. Reasoning from rain to snow is a typical analogy.</Paragraph> <Paragraph position="25"> Reasoning from wet street to rain is an abductive reasoning. For a parsing approach based on analogy c.f. (Lepage, 1999).</Paragraph> <Paragraph position="26"> tion and abduction may work conjointly whenever deductive inferences encounter gaps. A deductive inference stops in front of a gap between the premises and a possible conclusion. Abduction creates a new hypothesis, which allows to bridge the gap and to continue the inference.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Learning: a89a114a160 a3a59a3 a115 a12a1a14a3 a89 a6a59a6a15a12a1a14a3 a28 a6a59a6 </SectionTitle> <Paragraph position="0"> In this section, we formalize EBL. We mechanically substitute a115 in the definition of EBL by a124a161a32a162a135a29a32a162a154 to show their learning potentials.</Paragraph> <Paragraph position="1"> A learning system changes internal states, which influence the performance. The internal states of a107 are determined by a89 and a118 . We assume that, for a given a107 , a118 remains identical before and after learning. Therefore, the comparison of a89 (before learning) with a89a163a160 a90a113a109a36a96a82a110 (after learning) reveals the acquired knowledge.</Paragraph> <Paragraph position="2"> We define EBL in (14). a3 a115 a12a1 a3 a89 a6a59a6 is the parser before learning. This parser applies to a28 and yields From two otherwise identical parsers, the parser with a90a174a8 a17a147a17 a61a106a101 a32a162a39 a21 a32 a61 a129 a21 not present in the other has a greater deductive closure. The cardinality of</Paragraph> <Paragraph position="4"> edge. The empirical knowledge does not allow to conclude something new, but to resolve ambiguities in accordance with observed data, e.g. for a sub-language as shown in (Rayner and Samuelsson, 1994). Both learning techniques have the potential of improving the accuracy.</Paragraph> <Paragraph position="5"> A substitution of a118 with a124a161a32a162a135a29a32a162a154 reveals the transformation of a90 a101 a102a175a119 to a90a113a109a67a96a97a110 . We start with caching and recalling (Equation 15).</Paragraph> <Paragraph position="7"> Parsing a28 a177 with the cache of a90 a177 yields a90 a177 . The deductive closure is not enlarged. Quantitative relations with respect to a28 change ina89 . If a90a113a177 is not cached twice, memory-based EBL is idempotent.6 6Idempotence is the property of an operation that results in the same state no matter how many times it is executed.</Paragraph> <Paragraph position="8"> EBL with induction and deduction is shown in (16). Here the subscripts merit special attention:</Paragraph> <Paragraph position="10"> tegrating a90 a109a67a96a97a110 into C changes the empirical knowledge with respect to a61 and a62 . If the empirical knowledge does not influence a134 , D-EBL is idempotent. The deductive closure does not increase as EBL acquires empirical knowledge similarly to D-EBL. In addition, a new a17a147a17 a61 a157a31a32a162a39 a21 a32 a61a31a94 a21 is acquired. This a90a75a109a36a96a97a110 may differ from a90 a101 a102a175a119 with respect to a61 a157 and/or a61a31a94 . In the experiments in A-EBL we reported below, a61 a157a114a186a8a49a61 a103 and a61 a94 a8a49a61 a129 holds.</Paragraph> <Paragraph position="12"> Parsing is a classification task in which a61 a87 a53 is assigned to a28a84a87a13a86 . Differently from typical classification tasks in machine learning, natural language parsing requires an open set a53 . This is obtained via the recursive application of a58 , which unlike non-recursive styles of analysis (Srinivas and Joshi, 1999) yields a53 (syntax trees) of any complexity.</Paragraph> <Paragraph position="13"> Then a132a72a71a33a81a56a71a47a133a59a71 is applied to a53 so that a132a106a71a33a81a56a71a33a133a92a71 a12a1 a3a83a189a190a6 can be matched by further rules (c.f. 18). Without this reduction, recursive parsing could not go beyond memory-based parsing.</Paragraph> <Paragraph position="14"> tutions. Abductive term identification bridges gaps in the deduction (X a192 Y). The marker '?' is a graphical shortcut for the set of lexemes a23a33a62 a24 in a90 .</Paragraph> <Paragraph position="15"> The function a132a72a71a33a81a56a71a47a133a59a71 defines an induction and recursive parsing is thus a deduction. Combinations of memory-based and deduction-based parsing are deductions, combinations of abduction-based parsing with any another parsing are abductions.</Paragraph> <Paragraph position="16"> Macro Learning is the common term for the combination of EBL with recursive deduction (Tadepalli, 1991). A macro a60 a104a98a93 a129 a94a59a101 is a rule which yields the same result as a set of rules a58a194a193 with a87a11a58a194a193 does. In terms of a grammar, such macros correspond to redundant phrases, i.e. phrases that are obtained by composing smaller phrases of a58 . Macros represent shortcuts for the parser and, possibly, improved likelihood estimate of the composed structure compared to the estimates under independency assumption (Abney, 1996). When the usage of macros excludes certain types of analysis, e.g. by trying to find longest/best matches we can speak of pruning. This is the contribution of D-EBL for parsing.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Experiments in EBL </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Experimental purpose and setup </SectionTitle> <Paragraph position="0"> The aim of the experiments is to verify whether new knowledge is acquired in A-EBL and D-EBL. Secondly, we want to test the influence of new knowledge on parsing accuracy and speed.</Paragraph> <Paragraph position="1"> The general setup of the experiment is the following. We use a section of a treebank as seed-corpus (a89 a95a82a96a200a96 a119 ). We train the seed-corpus to a corpus-based parser. Using a test-corpus we establish the parsing PUS. The parser interleaves memory-based, deductive, and abductive parsing strategies in five steps: Recalling, non-recursive deduction, deduction via chunk substitution, first with lexemes, then without lexemes and finally abduction.</Paragraph> <Paragraph position="2"> accuracy and speed of the parser (a71a33a70 a61 a81a245a66 a61 a133a59a71 a3 a107 a12a1 a3a131a246a57a105a222a96a200a95a82a105a97a6a59a6a15a8 (recall,precision,f-score,time)). Then, we parse a large corpus (a107 a12a1 a3a131a246a99a6a187a8 a23 a90 a109a67a96a97a110 a24 ). A filter criterion that works on the explanation applies. We train those trees which pass the filter to the parser (a115 a12a1 a3a140a247 a95a82a96a200a96 a119 a160 a23 a90a113a109a67a96a97a110 a24 a6a181a8 a107 a109a36a96a82a110a149a6 ). Then the parsing accuracy and speed is tested against the same training corpus (a71a33a70 a61 a81a5a66 a61 a133a92a71 a3 a107 a109a36a96a82a110a163a12a1 a3a131a246a57a105a222a96a200a95a82a105a97a6a59a6a15a8 (recall,precision,f-score,time)). Sections of the Chinese Sinica Treebank (Huang et al., 2000) are used as seed-treebank and gold standard for parsing evaluation. Seed-corpora range between 1.000 and 20.000 trees. We train them to the parser OCTOPUS (Streiter, 2002a). This parser integrates memory- deduction- and abduction-based parsing in a hierarchy of preferences, starting from 1 memory-based parsing, 2 non-recursive deductive parsing, 3 recursive deductive parsing and 5 finally abductive parsing (Fig. 2).</Paragraph> <Paragraph position="3"> Learning the seed corpora (a115 a12a1a248a3a83a90 a30a82a249a59a249a59a249 a18a47a18a47a18a35a59a249a113a250a249a59a249a59a249 a6 ) results in a107 a30a82a249a59a249a59a249 a18a47a18a47a18 a107 a35a59a249a113a250a249a59a249a59a249 . For each a107 a87 the 5 Million word Sinica Corpus (Huang and Chen, 1992).</Paragraph> <Paragraph position="4"> For every a28 a87a43a86 the parser produces one parse-tree a90a88a8 a17 a28a72a32 a61 a21 and an explanation. The explanation has the form of a derivation tree in TAGs, c.f (Joshi, 2003). The deduction and abduction steps are visible in the explanation. Filters apply on the explanation and create sub-corpora that belong to one inference type.</Paragraph> <Paragraph position="5"> The first filter requires the explanation to contain only one non-recursive deduction, i.e. only parsing step 2. As deductive parsing is attempted after memory-based parsing (1), a62 a103 a186a8 a62a131a126 holds.</Paragraph> <Paragraph position="6"> A second filter extracts those structures, which are obtained by parsing step 4a or 5 where only one POS-labels may be different in the last characters (e.g. a155 a12a1 a3 a63 a2 a61 a73a113a63 a6 a8 a63 a2 a61a72a90 a63 ). The resulting corpora are a89 a101a75a252a254a253a56a253a56a253 a168 a171 a168 a220a162a218a4a3 a18a47a18a47a18 a89 a101a59a255a83a253a1a0a253a56a253a56a253 a168 a171 a168 a220a34a218a5a3 and</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 The Acquired Knowledge </SectionTitle> <Paragraph position="0"> We want to know whether or not new knowledge has been acquired and what the nature of this acquired knowledge is. As parsing was not recursive, we can approach the closure by the types of POS-sequences from all trees and their subtrees in a corpus. We contrast this with to the types of lexeme-sequences. The data show that only A-EBL increases the closure.</Paragraph> <Paragraph position="1"> But even when looking at lexemes, i.e. empirical knowledge, the A-EBL acquires richer information than D-EBL does.</Paragraph> <Paragraph position="2"> approximation of the closure with a247 a95a82a96a200a96 a119 , A-EBL and D-EBL. Below the number of type of LEXEMEsequences. null The representatives of the cached parses is gauged by the percentage of top NPs and VPs (including Ss) as top-nodes. Fig 5 shows the bias of cached parses which is more pronounced with D-EBL than with A-EBL.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Evaluating Parsing </SectionTitle> <Paragraph position="0"> The experiments consist in evaluating the parsing accuracy and speed for each parsing accuracy with C_seeda20 parsing accuracy with C_seed + C_Aa20 parsing accuracy with C_seed + C_Da20 We test the parsing accuracy on 300 untrained and randomly selected sentences using the f-score on unlabeled dependency relations. Fig. 6 shows parsing accuracy depending on the size of the seed-corpus. The graphs show side branches where we introduce the EBL-derived training material. This allows comparing the effect of A-EBL, D-EBL and hand-coded trees (the baseline). Fig. 7 shows the parsing speed in words per second (Processor:1000 MHz, Memory:128 MB) for the same experiments. Rising lines indicate a speed-up in parsing. We have interpolated parsing speed with C_seeda24 parsing speed with C_seed + C_Aa24 parsing speed with C_seed + C_Da24 The experimental results confirm the drop in parsing accuracy with D-EBL. This fact is consistent across all experiments. With A-EBL, the parsing accuracy increases beyond the level of departure. The data also show a speed-up in parsing. This speed-up is more pronounced and less data-hungry with A-EBL. Improving accuracy and efficiency are thus not mutually exclusive, at least for A-EBL.</Paragraph> </Section> </Section> class="xml-element"></Paper>