File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3215_intro.xml
Size: 4,442 bytes
Last Modified: 2025-10-06 14:02:51
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3215"> <Title>Object-Extraction and Question-Parsing using CCG</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 The Parser </SectionTitle> <Paragraph position="0"> The parser used in this paper is described in Clark and Curran (2004b). It takes as input a POS tagged sentence with a set of lexical categories assigned to each word. The CCG combinatory rules are used to combine the categories. A packed chart efficiently represents all of the possible analyses for a sentence, and the CKY chart parsing algorithm described in Steedman (2000) is used to build the chart.</Paragraph> <Paragraph position="1"> A Maximum Entropy CCG supertagger (Clark and Curran, 2004a) is used to assign the categories.</Paragraph> <Paragraph position="2"> The lexical category set is obtained from CCGbank (Hockenmaier, 2003a), a treebank of normal-form CCG derivations derived from the Penn Treebank.</Paragraph> <Paragraph position="3"> CCGbank is also used for learning the parameters of the supertagger and parsing models.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 The Supertagger </SectionTitle> <Paragraph position="0"> The supertagger uses a log-linear model to define a distribution for each word over the lexical category set. Model features are defined by the words and POS tags in the 5-word window surrounding the target word. The supertagger selects the most probable categories locally rather than maximising the sequence probability, assigning all categories whose probability is within some factor, , of the highest probability category. For a word seen frequently in the training data, the supertagger can only assign categories from the word's entry in the tag dictionary, which lists the categories each word has been seen with in the data.</Paragraph> <Paragraph position="1"> In Clark et al.'s (2002) parser, a supertagger is used as follows: first around 4 lexical categories are assigned to each word, on average; if the chart gets too big or parsing takes too long, the number of categories is reduced until the sentence can be parsed.</Paragraph> <Paragraph position="2"> In this paper we use our more recent approach (Clark and Curran, 2004a): first a small number of categories is assigned to each word, e.g. 1.5, and the parser requests more categories if a spanning analysis cannot be found. This method relies on the grammar being constraining enough to decide whether the categories provided by the supertagger are likely to contain the correct sequence. Section 6 shows that this approach works well for parsing questions.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Parsing Model </SectionTitle> <Paragraph position="0"> In Clark and Curran (2004b) we investigate several log-linear parsing models for CCG. In this paper we use the following conditional model:</Paragraph> <Paragraph position="2"> where y is a normal-form derivation and x is a sentence. (A normal-form derivation is one where composition and type-raising are used only when necessary.) There are various features, fi, used by the model: rule instantiation features which count the number of times a local tree occurs in a derivation; features defined by the root category of a derivation; and features defined by the lexical categories at the leaves. Each feature type has unlexicalised and head-lexicalised versions.</Paragraph> <Paragraph position="3"> The remaining features capture word-word dependencies, which significantly improve accuracy.</Paragraph> <Paragraph position="4"> The best-performing model encodes word-word dependencies in terms of the local rule instantiations, as in Hockenmaier and Steedman (2002). We have also tried predicate-argument dependencies, including long-range dependencies, but these have not improved performance. Note we still recover long-range dependencies, even if modelling them does not improve performance.</Paragraph> <Paragraph position="5"> The parser returns a derived structure corresponding to the most probable derivation. For evaluation the parser returns dependency structures, but we have also developed a module which builds first-order semantic representations from the derivations, which can be used for inference (Bos et al., 2004).</Paragraph> </Section> </Section> class="xml-element"></Paper>