File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/03/p03-1054_abstr.xml
Size: 8,403 bytes
Last Modified: 2025-10-06 13:42:52
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1054"> <Title>Accurate Unlexicalized Parsing</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar.</Paragraph> <Paragraph position="1"> Indeed, its performance of 86.36% (LP/LR F1) is better than that of early lexicalized PCFG models, and surprisingly close to the current state-of-theart. This result has potential uses beyond establishing a strong lower bound on the maximum possible accuracy of unlexicalized models: an unlexicalized PCFG is much more compact, easier to replicate, and easier to interpret than more complex lexical models, and the parsing algorithms are simpler, more widely understood, of lower asymptotic complexity, and easier to optimize.</Paragraph> <Paragraph position="2"> In the early 1990s, as probabilistic methods swept NLP, parsing work revived the investigation of probabilistic context-free grammars (PCFGs) (Booth and Thomson, 1973; Baker, 1979). However, early results on the utility of PCFGs for parse disambiguation and language modeling were somewhat disappointing. A conviction arose that lexicalized PCFGs (where head words annotate phrasal nodes) were the key tool for high performance PCFG parsing.</Paragraph> <Paragraph position="3"> This approach was congruent with the great success of word n-gram models in speech recognition, and drew strength from a broader interest in lexicalized grammars, as well as demonstrations that lexical dependencies were a key tool for resolving ambiguities such as PP attachments (Ford et al., 1982; Hindle and Rooth, 1993). In the following decade, great success in terms of parse disambiguation and even language modeling was achieved by various lexicalized PCFG models (Magerman, 1995; Charniak, 1997; Collins, 1999; Charniak, 2000; Charniak, 2001).</Paragraph> <Paragraph position="4"> However, several results have brought into question how large a role lexicalization plays in such parsers. Johnson (1998) showed that the performance of an unlexicalized PCFG over the Penn tree-bank could be improved enormously simply by annotating each node by its parent category. The Penn treebank covering PCFG is a poor tool for parsing because the context-freedom assumptions it embodies are far too strong, and weakening them in this way makes the model much better. More recently, Gildea (2001) discusses how taking the bilexical probabilities out of a good current lexicalized PCFG parser hurts performance hardly at all: by at most 0.5% for test text from the same domain as the training data, and not at all for test text from a different domain.1 But it is precisely these bilexical dependencies that backed the intuition that lexicalized PCFGs should be very successful, for example in Hindle and Rooth's demonstration from PP attachment. We take this as a reflection of the fundamental sparseness of the lexical dependency information available in the Penn Treebank. As a speech person would say, one million words of training data just isn't enough. Even for topics central to the treebank's Wall Street Journal text, such as stocks, many very plausible dependencies occur only once, for example stocks stabilized, while many others occur not at all, for example stocks skyrocketed.2 The best-performing lexicalized PCFGs have increasingly made use of subcategorization3 of the 1There are minor differences, but all the current best-known lexicalized PCFGs employ both monolexical statistics, which describe the phrasal categories of arguments and adjuncts that appear around a head lexical item, and bilexical statistics, or dependencies, which describe the likelihood of a head word taking as a dependent a phrase headed by a certain other word.</Paragraph> <Paragraph position="5"> 2This observation motivates various class- or similarity-based approaches to combating sparseness, and this remains a promising avenue of work, but success in this area has proven somewhat elusive, and, at any rate, current lexicalized PCFGs do simply use exact word matches if available, and interpolate with syntactic category-based estimates when they are not.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3In this paper we use the term subcategorization in the origi- </SectionTitle> <Paragraph position="0"> nal general sense of Chomsky (1965), for where a syntactic catcategories appearing in the Penn treebank. Charniak (2000) shows the value his parser gains from parentannotation of nodes, suggesting that this information is at least partly complementary to information derivable from lexicalization, and Collins (1999) uses a range of linguistically motivated and carefully hand-engineered subcategorizations to break down wrong context-freedom assumptions of the naive Penn treebank covering PCFG, such as differentiating &quot;base NPs&quot; from noun phrases with phrasal modifiers, and distinguishing sentences with empty subjects from those where there is an overt subject NP. While he gives incomplete experimental results as to their efficacy, we can assume that these features were incorporated because of beneficial effects on parsing that were complementary to lexicalization.</Paragraph> <Paragraph position="1"> In this paper, we show that the parsing performance that can be achieved by an unlexicalized PCFG is far higher than has previously been demonstrated, and is, indeed, much higher than community wisdom has thought possible. We describe several simple, linguistically motivated annotations which do much to close the gap between a vanilla PCFG and state-of-the-art lexicalized models. Specifically, we construct an unlexicalized PCFG which outperforms the lexicalized PCFGs of Magerman (1995) and Collins (1996) (though not more recent models, such as Charniak (1997) or Collins (1999)).</Paragraph> <Paragraph position="2"> One benefit of this result is a much-strengthened lower bound on the capacity of an unlexicalized PCFG. To the extent that no such strong baseline has been provided, the community has tended to greatly overestimate the beneficial effect of lexicalization in probabilistic parsing, rather than looking critically at where lexicalized probabilities are both needed to make the right decision and available in the training data. Secondly, this result affirms the value of linguistic analysis for feature discovery. The result has other uses and advantages: an unlexicalized PCFG is easier to interpret, reason about, and improve than the more complex lexicalized models. The grammar representation is much more compact, no longer requiring large structures that store lexicalized probabilities. The parsing algorithms have lower asymptotic complexity4 and have much smaller grammar egory is divided into several subcategories, for example dividing verb phrases into finite and non-finite verb phrases, rather than in the modern restricted usage where the term refers only to the syntactic argument frames of predicators.</Paragraph> <Paragraph position="3"> 4O(n3) vs. O(n5) for a naive implementation, or vs. O(n4) if using the clever approach of Eisner and Satta (1999).</Paragraph> <Paragraph position="4"> constants. An unlexicalized PCFG parser is much simpler to build and optimize, including both standard code optimization techniques and the investigation of methods for search space pruning (Caraballo and Charniak, 1998; Charniak et al., 1998).</Paragraph> <Paragraph position="5"> It is not our goal to argue against the use of lexicalized probabilities in high-performance probabilistic parsing. It has been comprehensively demonstrated that lexical dependencies are useful in resolving major classes of sentence ambiguities, and a parser should make use of such information where possible. We focus here on using unlexicalized, structural context because we feel that this information has been underexploited and underappreciated. We see this investigation as only one part of the foundation for state-of-the-art parsing which employs both lexical and structural conditioning.</Paragraph> </Section> </Section> class="xml-element"></Paper>