File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1619_intro.xml
Size: 5,804 bytes
Last Modified: 2025-10-06 14:03:58
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1619"> <Title>Extremely Lexicalized Models for Accurate and Fast HPSG Parsing</Title> <Section position="4" start_page="155" end_page="157" type="intro"> <SectionTitle> 2 HPSG and probabilistic models </SectionTitle> <Paragraph position="0"> HPSG (Pollard and Sag, 1994) is a syntactic theory based on lexicalized grammar formalism. In HPSG, a small number of schemata describe general construction rules, and a large number of lexical entries express word-specific characteristics. The structures of sentences are explained using combinations of schemata and lexical entries.</Paragraph> <Paragraph position="1"> Both schemata and lexical entries are represented by typed feature structures, and constraints represented by feature structures are checked with unification. null An example of HPSG parsing of the sentence &quot;Spring has come&quot; is shown in Figure 1. First, each of the lexical entries for &quot;has&quot; and &quot;come&quot; is unified with a daughter feature structure of the Head-Complement Schema. Unification provides the phrasal sign of the mother. The sign of the larger constituent is obtained by repeatedly applying schemata to lexical/phrasal signs. Finally, the parse result is output as a phrasal sign that dominates the sentence.</Paragraph> <Paragraph position="2"> Given a set W of words and a set F of feature structures, an HPSG is formulated as a tuple, G = <L,R> , where</Paragraph> <Paragraph position="4"> lexical entries, and R is a set of schemata; i.e., r [?] R is a partial function: F xF -F.</Paragraph> <Paragraph position="5"> Given a sentence, an HPSG computes a set of phrasal signs, i.e., feature structures, as a result of parsing. Note that HPSG is one of the lexicalized grammar formalisms, in which lexical entries determine the dominant syntactic structures.</Paragraph> <Paragraph position="6"> Previous studies (Abney, 1997; Johnson et al., 1999; Riezler et al., 2000; Malouf and van Noord, 2004; Kaplan et al., 2004; Miyao and Tsujii, 2005) defined a probabilistic model of unification-based grammars including HPSG as a log-linear model or maximum entropy model (Berger et al., 1996). The probability that a parse result T is assigned to a given sentence w = <w1,...,wn> is</Paragraph> <Paragraph position="8"> where lu is a model parameter, fu is a feature function that represents a characteristic of parse tree T, and Zw is the sum over the set of all possible parse trees for the sentence. Intuitively, the probability is defined as the normalized product of the weights exp(lu) when a characteristic corresponding to fu appears in parse result T. The model parameters, lu, are estimated using numerical optimization methods (Malouf, 2002) to maximize the log-likelihood of the training data.</Paragraph> <Paragraph position="9"> However, the above model cannot be easily estimated because the estimation requires the computation of p(T|w) for all parse candidates assigned to sentence w. Because the number of parse candidates is exponentially related to the length of the sentence, the estimation is intractable for long sentences. To make the model estimation tractable, Geman and Johnson (Geman and Johnson, 2002) and Miyao and Tsujii (Miyao and Tsujii, 2002) proposed a dynamic programming algorithm for estimating p(T|w). Miyao and Tsujii (2005) also introduced a preliminary probabilistic model p0(T|w) whose estimation does not require the parsing of a treebank. This model is introduced as a reference distribution of the probabilistic HPSG model; i.e., the computation of parse trees given low probabilities by the model is omitted in the estimation stage. We have (Previous probabilistic HPSG)</Paragraph> <Paragraph position="11"> where li is a lexical entry assigned to word wi in T and p(li|wi) is the probability of selecting lexical entry li for wi.</Paragraph> <Paragraph position="12"> In the experiments, we compared our model with the probabilistic HPSG model of Miyao and Tsujii (2005). The features used in their model are combinations of the feature templates listed in Table 1. The feature templates fbinary and funary are defined for constituents at binary and unary branches, froot is a feature template set for the root nodes of parse trees, and flex is a feature template set for calculating the preliminary probabilistic model. An example of features applied to the parse tree for the sentence &quot;Spring has come&quot; is shown in Figure 2.</Paragraph> <Paragraph position="14"> combinations of feature templates for fbinary <r,d,c,hw,hp,hl> ,<r,d,c,hw,hp> ,<r,d,c,hw,hl> , <r,d,c,sy,hw> ,<r,c,sp,hw,hp,hl> ,<r,c,sp,hw,hp> , <r,c,sp,hw,hl> ,<r,c,sp,sy,hw> ,<r,d,c,hp,hl> , <r,d,c,hp> ,<r,d,c,hl> ,<r,d,c,sy> ,<r,c,sp,hp,hl> , <r,c,sp,hp> ,<r,c,sp,hl> ,<r,c,sp,sy> combinations of feature templates for funary <r,hw,hp,hl> ,<r,hw,hp> ,<r,hw,hl> ,<r,sy,hw> , <r,hp,hl> ,<r,hp> ,<r,hl> ,<r,sy> combinations of feature templates for froot <hw,hp,hl> ,<hw,hp> ,<hw,hl> , <sy,hw> ,<hp,hl> ,<hp> ,<hl> ,<sy> combinations of feature templates for flex <wi,pi,li> ,<pi,li> r name of the applied schema d distance between the head words of the daughters c whether a comma exists between daughtersand/or inside daughter phrases sp number of words dominated by the phrase sy symbol of the phrasal category hw surface form of the head word hp part-of-speech of the head word hl lexical entry assigned to the head word wi i-th word pi part-of-speech for wi li lexical entry for wi</Paragraph> </Section> class="xml-element"></Paper>