File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/e91-1032_metho.xml
Size: 20,772 bytes
Last Modified: 2025-10-06 14:12:37
<?xml version="1.0" standalone="yes"?> <Paper uid="E91-1032"> <Title>Multiple Interpreters in a Principle-Based Model of Sentence Processing</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 The Processing Model </SectionTitle> <Paragraph position="0"> In the proposed model, we assume that the sentence processor strives to optimise local comprehension and integration of incoming material, into the current context. That is, decisions about the current syntactic analysis are made incrementally (for each input item) on the basis of principles which are intended maximise the overall interpretation. We have dubbed this the Principle of Incremental Comprehension (PIC), stated roughly as follows: (1) Principle of Incremental Comprehension: The sentence processor operates in such a way as to maximise comprehension of the sentence at each stage of processing.</Paragraph> <Paragraph position="1"> The PIC demands that the language comprehension system (LCS), and any sub-system contained within it (such as the syntactic and semantic processors), apply maximally to any input, thereby constructing a maximal, partial interpretation for a given partial input signal. This entails that each module within the LCS apply concurrently.</Paragraph> <Paragraph position="2"> The performance model is taken to be directly compatible with the modular, language universal, principle-based theory of current transformational grammar \[3\]. We further suggest a highly modular organisation of the sentence processor wherein modules are determined by the syntactic representations they recover. This is motivated more generally by Fodor's Modularity Hypothesis \[6\] which argues that the various perceptual/input systems consist of fast, dumb informationally encapsulated modules. Specifically, we posit four modules within the syntactic processor, each affiliated with a &quot;representational&quot; or &quot;informational&quot; aspect of the grammar. These are outlined below in conjunction with the grammatical subsystems to which they are related1: 1 This grouping of grammatical principles with representations is both partial and provisional, mad is intended only to give the reader a feel for the &quot;natural classes&quot; exploited by the model.</Paragraph> <Paragraph position="3"> In Figure 1, we illustrate one possible instance of the organisation within the Syntactic Processor. We assume that the phrase structure module drives processing based on lexical input, and that the thematic structure co-ordinates the relevant aspects of each processor for output to the semantic processor.</Paragraph> <Paragraph position="4"> Just as the PIC applies to the main modules of the LCS as discussed above, it also entails that all modules within the syntactic processor act concurrently so as to apply maximally for a par(ial input, as illustrated by the operation shown in Figure 2. For the partial input &quot;What did John put ... ~, we can recover the partial phrase structure, including the trace in Infl 2. In addition, we can recover the chain linking the did to its deep structure position in Infl (e-l), and also the 0-grid for the relation 'put' including the saturated agent role 'John'. We might also go one step further and postulate a trace as the direct object of put, which could be the 0-position of What, but this action might be incorrect if the sentence turned out to be What did John pui the book oaf, for example.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Principles and Representations </SectionTitle> <Paragraph position="0"> Before proceeding to a discussion of the model's implementation, we will first examine more closely the representational paradigm which is employed, and discuss some aspects of the principles and parameters theory we have adopted, restricting our attention primarily to phrase structure and Chains. In general, a 2 We assume here a head movement analysis, where the head of Infl moves to the head of Comp, to account for Subject-Aux inversion.</Paragraph> <Paragraph position="2"> particular representation can be broken down into two fundamental components: 1) units of information, i.e.</Paragraph> <Paragraph position="3"> the 'nodes' or feature-bundles which are fundamental to the representation, and 2) units of structure, the minimal structural 'schema' for relating 'nodes' with each other. With these two notions defined, the representation can be viewed as some recursive instance of its particular schema over a collection of nodes.</Paragraph> <Paragraph position="4"> The representation of phrase structure (PS), as determined principally by ~'-theory, encodes the local, sister-hood relations and defines constituency. The bar-level notation is used to distinguish the status of satellites:</Paragraph> <Paragraph position="6"> The linear precedence of satellites with respect to their sister X-nodes is taken to be parametrised for each category and language. The final rule (d) above, is simply the rule for lexical insertion. In addition to the canonical structures defined by ~--theory, we require additional rules to permit Chomsky-adjunction of phrases to maximal projections, via Move-~, and the rules for inserting traces (or more generally, empty categories) -- a consequence of the Projection Principle -- for moved heads and maximal projections.</Paragraph> <Paragraph position="7"> Given the rules above, we can see that possible phrase structures are limited to some combination of binary (non-terminal) and unary (terminal) branches.</Paragraph> <Paragraph position="8"> As discussed above, we can characterise the representational framework in terms of nodes and schemas: We allow two types of nodes: 1) non-terminals (N-Nodes), which are the nodes projected by X-theory, consisting of category, bar level, a unique ID, and the features projected from the head, and 2) terminals (T-Nodes), which are either lexical items or empty categories, which lack bar level, but posses phonological features (although these may be 'nil' for some empty categories). The schema, defines the unit of structure, using the '/' to represent immediate dominance, and square brackets '\[... \]' to encode sister-hood and linear precedence. Using this notation we define the two possible types of branches, binary and unary, where the latter is applicable just in case the daughter is a terminal node. The full PS representation (or Tree) is defined by allowing non-terminal daughters to dominate a recursive instance of the schema. It is interesting to note that, for phrase structure at least, the relevant principles of grammar can be stated purely as conditions on branches, rather that trees. More generally, We will assume the schema of a particular representation provides a formal characterisation of locality. Just as phrase structure is defined in terms of branches, we can define Chains as a sequence of links.</Paragraph> <Paragraph position="9"> More specifically, each position contained by the chain 'is a node, which represents its category and level (a phrase or a head), the status of that position (either A or A--), its ID (or location), and relevant features (such as L-marking, Case, and 0). If we adhere to the representational paradigm used above, we can define Chains in the following manner: If we let 'co' denote the linking of two C-Nodes, then we can define a Chain to be an ordered fist of C-Nodes, such that successive C-Nodes satisfy the link relation. In the above definition we have used the '1' operator and list notation in the standard Prolog sense. The 'head' function returns the first C-Node in a (sub) Chain (possibly \[ \]), for purposes of satisfying the link relation. Furthermore, <C-Node co \[ \]> is a well-formed link denoting the tail, Deep-Structure position, of a Chain. Indeed, if this is the only link in the Chain we refer to it as a 'unit' Chain, representing an unmoved element.</Paragraph> <Paragraph position="10"> We noted above that each representation's schema provides a natural locality constraint. That is, we should be able to state relevant principles and constraints locally, in terms of the schematic unit. This clearly holds for Subjacency, a well-formedness condition which holds between two nodes of a link: (4) <C-Nodei co C-Nodej> ---, subjacent(C-Nodei,C-Nodej) Other Chain conditions include the Case filter and 0-Criterion. The former stipulates that each NP Chain receive Case at the 'highest' A-position, while the latter entails that each argument Chain receive exactly one 0-role, assigned to the uniquely identifiable < C-Node# co \[ \] > link in a Chain. It is therefore possible to enforce both of these conditions on locally identifiable links of a Chain: (5) In an argument (NP) Chain, i) <C-Node- A- co C-NodeA> case-mark(C-Nodea) or, ii) C-NodeA - head(Chain) --* ease-mark(C-Nodea) In an argument Chain, <C-Nodes co \[\]> --, theta-mark(C-Node0) In describing the representation of a Chain, we have drawn upon Prolog's list notation. To carry this further, we can consider the link operator 'co' to be equivalent to the ',' separator in a list, such that for all \[... C-Nodei,C-Nodej ... \], C-Nodei co C-NOdej holds. In this way, we ensure that each node is well-formed with respect to adjacent nodes (i.e. in accordance with principles such as those identified in (4) & (5)).</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 The Computational Model </SectionTitle> <Paragraph position="0"> In the same manner that linguistic theory makes the distinction between competence (the grammar) and performance (the parser), logic programming distinguishes the declarative specification of a program from its execution. A program specification consists of a set of axioms from which solution(s) can be proved as derived theorems. Within this paradigm, the nature of computation is determined by the inferencing strategy employed by the theorem prover. This aspect of logic programming has often been exploited for parsing; the so called Parsing as Deduction hypothesis.</Paragraph> <Paragraph position="1"> In particular it has been shown that meta.interpreters or program transformations can be used to affect the manner in which a logic grammar is parsed \[10\] \[1i\], Recently, there has been an attempt to extend .the PAD hypothesis beyond its application to simple logic grammars \[14\], \[13\] and \[8\]. In particular, Johnson has developed a prototype parser for a fragment of a GB grammar \[9\]. The system consists of a declarative specification of the GB model, which incorporates - 187the various principles of grammar and multiple levels of representation. Johnson then illustrates how the fold/unfold transformation and goal freezing, when applied to various components of the grammar, can be used to render more or less efficient implementations.</Paragraph> <Paragraph position="2"> Unsurprisingly, this deductive approach to parsing inherits a number of problems with automated deduction in general. Real (implemented) theorem provers are, at least in the general case, incomplete. Indeed, we can imagine that a true, deductive implementation of GB would present a problem. Unlike traditional, homogeneous phrase structure grammars, GB makes use of abstract, modular principles, each of which may be relevant to only a particular type or level of representation. This modular, heterogeneous organisation therefore makes the task of deriving some single, specialised interpreter with adequate coverage and efficiency, a very difficult one.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Deduction in a Modular System </SectionTitle> <Paragraph position="0"> In contrast with the single processor model employed by Johnson, the system we propose consists of a number of processors over subsets of the grammar. Central to the model is a declarative specification of the principles of grammar, defined in terms of the representations listed in (2), as described in SS3. If we take this specification of the grammar to be the &quot;competence component&quot;, then the &quot;performance component&quot; can be stated as a parse relation which maps the input string to a well-formed &quot;State&quot;, where State = { PS,TS,ChS,CiS }, the 4-tuple constituting all aspects of syntactic analysis. The highest level of the parser specifies how each module may communicate with the others. Specifically, the PS processor acts as input to the other processors which construct their representations based on the PS representations and their own &quot;representation specific&quot; knowledge. In a weaker model, it may be possible for processors to inspect the current State (i.e. the other representations) but crucially, no processor ever actually &quot;constructs&quot; another processor's representation. The communication between modules is made explicit by the Prolog specification shown below:</Paragraph> <Paragraph position="2"> ps_module(Lexlnput,PS).</Paragraph> <Paragraph position="3"> The parse relation defines the organisation of the processors as shown in Figure 1. The Prolog specification above appears to suffer from the traditional depth-first, left-to-right computation strategy used by Prolog. That is, parse seems to imply the sequential execution of each processor. As Stabler has illustrated, however, it is possible to alter the computation rule used \[12\], so as to permit incremental interpretation by each module: effectively coroutining the various modules. Specifically, Prolog's freeze directive allows processing of a predicate to be delayed temporarily until a particular argument variable is (partially) instantiated. In accord with the input dependencies shown in (7), each module is frozen (or 'waits') on its input: Using this feature we may effectively &quot;coroutine&quot; the four modules, by freezing the PS processor on Input, and freezing the remaining processors on PS.</Paragraph> <Paragraph position="4"> Theresult is that each representation is constructed incrementally, at each stage of processing. To illustrate this, consider once again the partial input string &quot;What did John put ...'. The result of the call pars e ( \[what, did, john, put I 1, St at e) would yield the following instantiation of State (with the representations simplified for readability):</Paragraph> <Paragraph position="6"> The PS representation has been constructed as mu6h as possible, including the trace of the moved head of Intl. The ChS represents a partial chain for what and the entire chain for did, which moved from its deep structure position to the head of CP, and &quot;IS contains a partial 0-grid for the relation 'put', in which the Agent role as been saturated.</Paragraph> <Paragraph position="7"> This is reminiscent of Johnson's approach \[9\], but differs crucially in a number of respects. Firstly, we posit several processors which logically exploit the grammar, and it is these processors which are coroutined, not the principles of grammar themselves. Each interpreter is responsible for recovering only one, homogeneous representation, with respect to one input representation. This makes reasoning about the computational behaviour of individual processors much easier. At each stage of processing, the individual procensors increment their representations if and only if, for :the current input, there is a &quot;theorem&quot; provable from the grammar, which permits the new structure to be added. This me\[a-level &quot;parsing as deduction&quot; approach permits more finely tuned control of the parser as a whole, and allows us to specify distinct inferencing strategies for each interpreter, tailoring it to the particular representation.</Paragraph> <Paragraph position="8"> We have illustrated in SS3 that the various representations and grammatical principles may be defined in terms of their respective schematic units. Given this, the task of recovering representations (roughly parsing) is simply a matter of proving well-formed representations, as recursive instances of 'schematic axioms', i.e. those instantiations of a schema which are considered well-formed by the grammar. The form of the PS-Module can be depicted as in Figure 3.</Paragraph> <Paragraph position="9"> The PS interpreter incorporates lexical input into the phrase structure tree based on possible structures allowed by the grammar. Possible structures are determined by the ps_view relation, which returns those possible instantiations of the PS schema (as described in SS3) which are well-formed with respect to the relevant principles of grammar. In general, ps_view will return any possible branch structure licensed by the grammar, but is usually constrained by partial instantiation of the query. In cases where multiple analyses are possible, the ps_view predicate may use some selection rule to choose between them 3. The following is a specification of the PS interpreter:</Paragraph> <Paragraph position="11"> psAex_e val( X- X O,N ode/ Daughters).</Paragraph> <Paragraph position="12"> As we have discussed above, the ps..module is frozen on lexical input represented here as as difference-list. This is one way in which we might implement attachment principles to account for human preferences, see Crocker \[4\] for discussion.</Paragraph> <Paragraph position="13"> The top-level of the PS interpreter is broken down into three possible cases. The first handles non-lexicai nodes, i.e. those of category C or I, since phrase structure for these categories is not lexically determined, and can be derived 'top-down', strictly from the grammar. We can, for example, automatically hypothesize a Subject specifier and VP complement for Intl. The second clause deals with the postulation of empty categories, while the third can be considered the 'base' case which simply attaches lexical material. Roughly, ps.ec_eval attempts to derive a leaf which is a trace.</Paragraph> <Paragraph position="14"> This action is then verified by the concurrent Chain processor, which determines whether or not the trace is licensed (see the following section). This implements an approximation of the filler-driven strategy for identifying traces, a strategy widely accepted in the psycholinguistic literature 4.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 The Chain-Module Specification </SectionTitle> <Paragraph position="0"> Just as the phrase structure processor is delayed on lexical input, the chain processor is frozen with respect to phrase structure. The organisation of the Chain Module is shown in Figure 4, and is virtually identical to that of the PS Module (in Figure 3). However rather than recovering branches of phrase structure, it recovers links of chains, determining their well-formedness with respect to the relevant grammatical axioms.</Paragraph> <Paragraph position="1"> For this module, incremental processing is implemented by 'freezing' with respect to the input tree representation. The following code illustrates how the top-level of the Chain interpreter can traverse the PS tree, such that it is coroutined with the recovery of it will only execute if the daughter(s) of the current sub-tree is instantiated. Given this, che.~odnle will perform a top-down traversal of the PS tree, delaying when the daughters are uninstantiated, thus coroutined with the PS-module. The chain_inl~ predicate then determines if any action is to be taken, for the current branch, by the chain interpreter:</Paragraph> <Paragraph position="3"> visible(X/\[Left,Satellite\],C-Node), chain.member(C-Nodes,CS).</Paragraph> <Paragraph position="4"> The chain ~ut predicate decides whether or not the satellite of the current branch is relevant, or 'visible' to the Chain interpreter, and if so returns an appropriate C-Node for that element. The two visible entities are antecedents, i.e. arguments (if we assume that all arguments form chains, possibly of unit length) or elements in an ~ positions (such as \[Spec,CP\] or a Chomsky-adjoined position) and traces. If a trace or an antecedent is identified, then it must be a member of a well-formed chain. The chain.~ember predicate postulates new chains for lexical antecedents, and attempts to append traces to existing chains. This operation must in turn satisfy the chain_view relation, to ensure the added link obeys the relevant grammatical constraints.</Paragraph> </Section> </Section> class="xml-element"></Paper>