File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/p94-1018_metho.xml

Size: 6,468 bytes

Last Modified: 2025-10-06 14:13:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="P94-1018">
  <Title>A Psycholinguistically Motivated Parser for CCG</Title>
  <Section position="4" start_page="125" end_page="126" type="metho">
    <SectionTitle>
3 The Simplest Parser
</SectionTitle>
    <Paragraph position="0"> Let us consider the simplest conceivable parser. Its specification is &amp;quot;find all analyses of the string so far.&amp;quot; It has a collection of slots for maintaining one analysis each, in parallel. Each slot maintains an analysis of the string seen so far -- a sequence of one or more derivations. The parser has two operations, as shown in figure 1.</Paragraph>
    <Paragraph position="1"> This parser succeeds in constructing the incremental analysis (2) necessary for solving the problem in (1).</Paragraph>
    <Paragraph position="2"> 1Two common combinatory rules, type-raising and substitution are not listed here. The substitution rule (Steedman 1987) is orthogonal to the present discussion and can be added without modification. The rule for type-raising (see e.g. Dowty 1988) can cause difficulties for the parsing scheme advocated here (Hepple 1987) and is therefore assumed to apply in the lexicon. So a proper name, for example, would be have two categories: np and s/(s\np).</Paragraph>
    <Paragraph position="3">  X/Y YIZ1...IZ. XIZI...\[Z. &gt;n Y X\Y , X &lt;0 YIZ x\Y , xlz &lt;1 Y\]-Z, \[Z2 X\Y ~ X\]-ZIIZ2 &amp;quot; &lt;2 Y\[Z1... \[Zn X\Y ' X\[Z1... \[Zn &lt;n IZ stands for either/Z or \Z. Underlined regions in a rule must match.</Paragraph>
  </Section>
  <Section position="5" start_page="126" end_page="126" type="metho">
    <SectionTitle>
* scan
</SectionTitle>
    <Paragraph position="0"> get the next word from the input stream for each analysis a in the parser's memory empty the slot containing a for each lexical entry e of the word make a copy a ~ of a add the leaf derivation e to the right of a ~ add a ~ as a new analysis * combine for each analysis a in the parser's memory if a contains more than one constituent and some rule can combine the rightmost two constituents in a then make a copy a ~ of a replace the two constituents of a ~ by their combination add a / as a new analysis  the flowers sent (2) s/(s\np)/, n &gt;0s\np/pp s/(s\np) &gt;I s/pp But this parser is just an unconstrained shift-reduce parser that simulates non-determinism via parallelism. It suffers from a standard problem of simple bottom-up parsers: it can only know when a certain substring has a derivation, but in case a sub-string does not have a derivation, the parser cannot yet know whether or not a larger string containing the substring will have a derivation. This means that when faced with a string such as (3) The insults the new students shouted at the teacher were appalling.</Paragraph>
    <Paragraph position="1"> the parser will note the noun-verb ambiguity of 'insults', but will be unable to use the information that 'insults' is preceded by a determiner to rule out the verb analysis in a timely fashion. It would only notice the difficulty with the verb analysis after it had come to the end of the string and failed to find a derivation for it. This delay in ruling out doomed analyses means that the parser and the interpreter are burdened with a quickly proliferating collection of irrelevant analyses.</Paragraph>
    <Paragraph position="2"> Standard solution to this problem (e.g. Earley's 1970 parser; LR parsing, Aho and Johnson 1974) consider global properties of the competence grammar to infer that no grammatical string will begin with a determiner followed by a verb. These solutions exact a cost in complicating the design of the parser: new data structures such as dotted rules or an LR table must be added to the parser. The parser is no longer a generic search algorithm for the competence grammar. Given the flexibility of CCG derivations, one may consider imposing a very simple constraint on the parser: every prefix of a grammatical string must have a derivation. But such a move it too heavy-handed. Indeed CCG often gives left-branching derivations, but it is not purely left-branching. For example, the derivation of a WH-dependency requires leaving the WHfiller constituent uncombined until the entire gapcontaining constituent is completed, as in (4).</Paragraph>
    <Paragraph position="4"/>
  </Section>
  <Section position="6" start_page="126" end_page="127" type="metho">
    <SectionTitle>
4 The Viable Analysis Criterion
</SectionTitle>
    <Paragraph position="0"> Given the desideratum to minimize the complexity of the biologically specified parser, I propose that the human parser is indeed as simple as the scancombine algorithm presented above, and that the ability to rule out analyses such as determiner+verb is not innate, but is an acquired skill. This 'skill' is implemented as a criterion which an analysis must meet in order to survive. An infant starts out with this criterion completely permissive. Consequently it cannot process any utterances longer than a few words without requiring excessively many parser  slots. But as the infant observes the various analyses in the parser memory and tracks their respective outcomes, it notices that certain sequences of categories never lead to a grammatical overall analysis. After observing an analysis failing a certain number of times and never succeeding, the child concludes that it is not a viable analysis and learns to discard it. The more spurious analyses are discarded, the better able the child is to cope with longer strings.</Paragraph>
    <Paragraph position="1"> The collection of analyses that are maintained by the parser is therefore filtered by two independent processes: The Viable Analysis Criterion is a purely syntactic filter which rules out analyses independently of ambiguity. The interpreter considers the semantic information of the remaining analyses in parallel and occasionally deems certain analyses more sensible than their competitors, and discards the latter.</Paragraph>
    <Paragraph position="2"> Given that English sentences rarely require more than two or three CCG constituents at any point in their parse, and given the limited range of categories that arise in English, the problem of learning the viable analysis criterion from data promises to be comparable to other n-gram learning tasks. The empirical validation of this proposal awaits the availability of a broad coverage CCG for English, and other languages. 2</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML