File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1715_intro.xml
Size: 5,578 bytes
Last Modified: 2025-10-06 14:02:04
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1715"> <Title>Abductive Explanation-based Learning Improves Parsing Accuracy and Efficiency</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The difficulties of natural language parsing, in general, and of parsing Chinese, in particular, are due to local ambiguities of words and phrases. Extensive linguistic and non-linguistic knowledge is required for their resolution (Chang, 1994; Chen, 1996). Different parsing approaches provide different types of knowledge. Example-based parsing approaches offer rich syntagmatic contexts for disambiguation, richer than rule-based approaches do (Yuang et al., 1992). Statistical approaches to parsing acquire mainly paradigmatic knowledge and require larger corpora, c.f. (Carl and Langlais, 2003). Statisti- null project, which integrates NLP technologies into a Internet-based natural language learning platform (Streiter et al., 2003). Example-based parsing generalizes examples during compilation time, e.g. (Bod and Kaplan, 1998), or performs a similarity-based fuzzy match during runtime (Zavrel and Daelemans, 1997). Both techniques may be computationally demanding, their effect on parsing however is quite different, c.f. (Streiter, 2002a).</Paragraph> <Paragraph position="1"> Explanation-based learning (EBL) is a method to speed-up rule-based parsing via the caching of examples. EBL however trades speed for accuracy.</Paragraph> <Paragraph position="2"> For many systems, a small loss in accuracy is acceptable if an order of magnitude less computing time is required. Apart from speed, one generally recognizes that EBL acquires some kind of knowledge from texts. However, what is this knowledge like if it does not help with parsing? Couldn't a system improve by learning its own output? Can a system learn to parse Chinese by parsing Chinese? The paper sets out to tackle these questions in theory and practice.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.1 Explanation-based Learning (EBL) </SectionTitle> <Paragraph position="0"> Explanation-based learning techniques transform a general problem solver (PS) into a specific and operational PS (Mitchel et al., 1986). The caching of the general PS's output accounts for this transformation. The PS generates, besides the output, a documentation of the reasoning steps involved (the explanation). This determines which output the system will cache.</Paragraph> <Paragraph position="1"> The utility problem questions the claim of speeding-up applications (Minton, 1990): Retrieving cached solutions in addition to regular processing requires extra time. If retrieval is slow and cached solutions are rarely re-used, the cost-benefit ratio is negative.</Paragraph> <Paragraph position="2"> The accuracy of the derived PS is generally below that of the general PS. This may be due to the EBL framework as such or the deductive base of the PS. Research in abductive EBL (A-EBL) seems to suggest the latter: A-EBL has the potential to acquire new knowledge (Dimopoulos and Kakas, 1996). The relation between knowledge and accuracy however is not a direct and logical one. The U-shaped language learning curves in children exemplifies the indirect relation (Marcus et al., 1992). Wrong regular word forms supplant correct irregular forms when rules are learned. We therefore cannot simply equate automatic knowledge acquisition and accuracy improvement, in particular for complex language tasks.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 1.2 EBL and Natural Language Parsing </SectionTitle> <Paragraph position="0"> Previous research has applied EBL for the speed-up of large and slow grammars. Sentences are parsed.</Paragraph> <Paragraph position="1"> Then the parse trees are filtered and cached. Subsequent parsing uses the cached trees. A complex HPSG-grammar transforms into tree-structures with instantiated values (Neumann, 1994). One hash table lookup of POS-sequences replaces typed-feature unification. Experiments conducted in EBLaugmented parsing consistently report a speed-up of the parser and a drop in accuracy (Rayner and Samuelsson, 1994; Srinivas and Joshi, 1995).</Paragraph> <Paragraph position="2"> A loss of information may explain the drop of accuracy. Contextual information, taken into account by the original parser, may be unavailable in the new operational format (Sima'an, 1997), especially if partial, context-dependent solutions are retrieved.</Paragraph> <Paragraph position="3"> In addition, the set of cached parse trees, judged to be &quot;sure to cache&quot;, is necessarily biased (Streiter, 2002b). Most cached tree structures are short noun phrases. Parsing from biased examples will bias the parsing.</Paragraph> <Paragraph position="4"> A further reason for the loss in accuracy are incorrect parses which leak into the cache. A stricter filter does not solve the problem. It increases the bias in the cache, reduces the size of the cache, and evokes the utility problem.</Paragraph> <Paragraph position="5"> EBL actually can improve parsing accuracy (Streiter, 2002b) if the grammar does not derive the parses to be cached via deduction but via abduction.</Paragraph> <Paragraph position="6"> The deductive closure2 which cannot increase with EBL from deductive parsing may increase with abductive parsing.</Paragraph> </Section> </Section> class="xml-element"></Paper>