File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1044_metho.xml

Size: 24,847 bytes

Last Modified: 2025-10-06 14:14:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1044">
  <Title>Maximal Incrementality in Linear Categorial Deduction</Title>
  <Section position="5" start_page="344" end_page="345" type="metho">
    <SectionTitle>
3 First-order Compilation
</SectionTitle>
    <Paragraph position="0"> The first-order formulae are those with only atomic argument types (i.e. ~&amp;quot; ::= A I .~o-A).</Paragraph>
    <Paragraph position="1"> Hepple (1996) shows how deductions in implicational linear logic can be recast as deductions involving only first-order formulae. 3 The method involves compiling the original formulae to indexed first-order formulae, where a higher-order initial formula yields multiple compiled formulae, e.g. (omitting indices) Xo-(yo--Z) would yield Xo-Y and Z, i.e. with the subformula relevant to hypothetical reasoning (Z) effectively excised from the initial formulae, to be treated as a separate assumption, leaving a first-order residue. Indexing is used in ensuring general linear use of resources, but also notably to ensure proper use of excised subformulae, i.e. so that Z, in our example, must be used in deriving the argument of Xo-Y, and not elsewhere (otherwise invalid deductions would be derivable).</Paragraph>
    <Paragraph position="2"> The approach is best explained by example. In proving Xo-(Yo--Z), Yo-W, Wo--Z =~ X, compilation of the premise formulae yields the indexed formulae that form the assumptions of (3), where formulae (i) and (iv) both derive from Xo--(Yo-Z).</Paragraph>
    <Paragraph position="3"> (Note in (3) that the lambda terms of assumptions are written below their indexed types, simply to help the proof fit in the column.) Combination is allowed by the single inference rule (4).</Paragraph>
    <Paragraph position="5"> {i, j, k, l}: X: x()tz.y(wz)) (4) C/: Ao--(B:~) : Av.a C/ : B : b lr = C/t~C/ r: A: a\[b//vl  Each assumption in (3) is associated with a set containing a single index, which serves as the unique 3The point of this manoeuvre (i.e. compiling to first-order formulae) is to create a deduction method which, like chart parsing for phrase-structure grammar, avoids the need to recompute intermediate results when searching exhaustively for all possible analyses, i.e. where any combination of types contributes to more than one over-all analysis, it need only be computed once. The incremental system to be developed in this paper is similarly compatible with a 'chart-like' processing approach, although this issue will not be further addressed within this paper. For earlier work on chart-parsing type-logical formalisms, specifically the associative Lambek calculus, see KSnig (1990), Hepple (1992), K5nig (1994).</Paragraph>
    <Paragraph position="6">  identifier for that assumption. The index sets of a derived formula identify precisely those assumptions from which it is derived. The rule (4) ensures appropriate indexation, i.e. via the condition rr = C/~C/, where t~ stands for disjoint union (ensuring linear usage). The common origin of assumptions (i) and (iv) (i.e. from Xo--(Yo-Z)) is recorded by the fact that (i)'s argument is marked with (iv)'s index (j).</Paragraph>
    <Paragraph position="7"> The condition a C ~b of (4) ensures that (iv) must contribute to the derivation of (i)'s argument (which is needed to ensure correct inferencing). Finally, observe that the semantics of (4) is handled not by simple application, but rather by direct substitution for the variable of a lambda expression, employing a special variant of substitution, notated _\[_//_\] (e.g. t\[s//v\] to indicate substitution of s for v in t), which specifically does not act to avoid accidental binding.</Paragraph>
    <Paragraph position="8"> In the final inference of (3), this method allows the variable z to fall within the scope of an abstraction over z, and so become bound. Recall that introduction inferences of the original formulation are associated with abstraction steps. In this approach, these inferences are no longer required, their effects having been compiled into the semantics. See (Hepple, 1996) for more details, including a precise statement of the compilation procedure.</Paragraph>
  </Section>
  <Section position="6" start_page="345" end_page="346" type="metho">
    <SectionTitle>
4 Flexible Deduction
</SectionTitle>
    <Paragraph position="0"> The approach just outlined is unsuited to incremental processing. Its single inference rule allows only a rigid style of combining formulae, where order of combination is completely determined by the argument order of functors. The formulae of (3), for example, must combine precisely as shown. It is not possible, say, to combine assumptions (i) and (if) together first as part of a derivation. To overcome this limitation, we might generalise the combination rule to allow composition of functions, i.e. combinations akin to e.g. Xo-Y, Yo--W ==&gt; Xo-W. However, the treatment of indexation in the above system is one that does not readily adapt to flexible combination.</Paragraph>
    <Paragraph position="1"> We will transform these indexed formulae to another form which better suits our needs, using the compilation procedure (5). This procedure returns a modified formula plus a set of equations that specify constraints on its indexation. For example, the assumptions (i-iv) of (3) yield the results (6) (ignoring semantic terms, which remain unchanged). Each atomic formula is partnered with an index set (or typically a variable over such), which corresponds to the full set of indices to be associated with the complete object of that category, e.g. in (i) we have (X+C/), plus the equation C/ = {i}Wrr which tells us that X's index set C/ includes the argument formula Y's index set rr plus its own index i. The further constraint equation C/ = {i}t~rr indicates that the argument's index set should include j (c.f. the conditions for using the original indexed formula).</Paragraph>
    <Paragraph position="3"> (6) i. old formula: {i}: Xo--(Y:{j}) new formula: (X+C)o-(Y+Tr) constraints: {C/ = {i}~rr, {j} C 7r} if. old formula: {k}:Yo-(W:O) new formula: (V+a)o-(W%3) constraints: {a = {k}~/~} iii. old formula: {l} :Wo-(Z:O) new formula: (W+7)o-(Z+~) constraints: {7 = {l}t~} iv. old formula: {j} :Z new formula: (Z+{j}) constraints: 0 (7) Ac--B : Av.a B : b  A: a\[bllv\] The previous inference rule (4) modifies to (7), which is simpler since indexation constraints are now handled by the separate constraint equations. We leave implicit the fact that use of the rule involves unification of the index variables associated with the two occurrences of &amp;quot;B&amp;quot; (in the standard manner). The constraint equations for the result of the combination are simply the sum of those for the formulae combined (as affected by the unification step). For example, combination of the formulae from (iii) and (iv) of (6) requires unification of the index set expressions 6 and {j}, yielding the result formula (W+7) plus the single constraint equation V = {l}tg{j}, which is obviously satisfiable (with 3' = {j,l}). A combination is not allowed if it results in an unsatisfiable set of constraints. The modified approach so neatly moves indexation requirements off into the constraint equation domain that we shall henceforth drop all consideration of them, assuming them to be appropriately managed in the background.</Paragraph>
    <Paragraph position="4">  We can now state a generalised composition rule as in (8). The inference is marked as \[m, n\], where m is the argument position of the 'functor' (always the lefthand premise) that is involved in the combination, and n indicates the number of arguments inherited from the 'argument' (righthand premise). The notation &amp;quot;o--Zn...o--Zl&amp;quot; indicates a sequence of n arguments, where n may be zero, e.g. the case \[1,0\] corresponds precisely to the rule (7). Rule (8) allows the non-applicative derivation (9) over the formulae from (6) (c.f. the earlier derivation (3)).</Paragraph>
  </Section>
  <Section position="7" start_page="346" end_page="346" type="metho">
    <SectionTitle>
5 Incremental Derivation
</SectionTitle>
    <Paragraph position="0"> As noted earlier, the relevance of flexible CGs to incremental processing relates to their ability to assign highly left-branching analyses to sentences, so that many initial substrings are treated as interpretable constituents. Although we have adapted the (Hepple, 1996) approach to allow flexibility in deduction, the applicability of the notion 'leftbranching' is not clear since it describes the form of structures built in proof systems where formulae are placed in a linear order, with combination dependent on adjacency. Linear deduction methods, on the other hand, work with unordered collections of formulae. Of course, the system of labelling that is in use -- where the constraints of the 'real' grammatical logic reside -- may well import word order information that limits combination possibilities, but in designing a general parsing method for linear categorial formalisms, these constraints must remain with the labelling system.</Paragraph>
    <Paragraph position="1"> This is not to say that there is no order information available to be considered in distinguishing incremental and non-incremental analyses. In an incremental processing context, the words of a sentence are delivered to the parser one-by-one, in 'leftto-right' order. Given lexical look-up, there will then be an 'order of delivery' of lexical formulae to the parser. Consequently, we can characterise an incremental analysis as being one that at any stage includes the maximal amount of 'contentful' combination of the formulae (and hence also lexical meanings) so far delivered, within the limits of possible combination that the proof system allows. Note that we have not in these comments reintroduced an ordered proof system of the familiar kind by the back door. In particular, we do not require formulae to combine under any notion of 'adjacency', but simply 'as soon as possible'.</Paragraph>
    <Paragraph position="2"> For example, if the order of arrival of the formulae in (9) were (i,iv)-&lt;(ii)-&lt;(iii) (recall that (i,iv) originate from the same initial formula, and so must arrive together), then the proof (9) would be an incremental analysis. However, if the order instead was (ii)-&lt;(iii)-&lt;(i,iv), then (9) would not be incremental, since at the stage when only (ii) and (iii) had arrived, they could combine (as part of an equivalent alternative analysis), but are not so combined in (9).</Paragraph>
  </Section>
  <Section position="8" start_page="346" end_page="349" type="metho">
    <SectionTitle>
6 Derivational Equivalence,
</SectionTitle>
    <Paragraph position="0"> It seems we have achieved our aim of a linear deduction method that allows incremental analysis quite easily, i.e. simply by generalising the combination rule as in (8), having modified indexed formulae using (5). However, without further work, this 'achievement' is of little value, because the resulting system will be very computationally expensive due to the problem of 'derivational equivalence' or 'spurious ambiguity', i.e. the existence of multiple distinct proofs which assign the same reading. For example, in addition to the proof (9), we have also the equivalent proof (10).</Paragraph>
    <Paragraph position="2"> The solution to this problem involves specifying a normal form for deductions, and allowing that only normal form proofs are constructed) Our route to specifying a normal form for proofs exploits a correspondence between proofs and dependency structures.</Paragraph>
    <Paragraph position="3"> Dependency grammar (DG) takes as fundamental ~This approach of 'normal form parsing' has been applied to the associative Lambek calculus in (K6nig, 1989), (Hepple, 1990), (Hendriks, 1992), and to Combinatory Categorial Grammar in (Hepple &amp; Morrill, 1989), (Eisner, 1996).</Paragraph>
    <Paragraph position="4">  the notions of head and dependent. An analogy is often drawn between CG and DG based on equating categorial functors with heads, whereby the arguments sought by a functor are seen as its dependents. The two approaches have some obvious differences.</Paragraph>
    <Paragraph position="5"> Firstly, the argument requirements of a categorial functor are ordered. Secondly, arguments in CG are phrasal, whereas in DG dependencies are between words. However, to identify the dependency relations entailed by a proof, we may simply ignore argument ordering, and we can trace through the proof to identify those initial assumptions ('words') that are related as head and dependent by each combination of the proof. This simple idea unfortunately runs into complications, due to the presence of higher order functions. For example, in the proof (2), since the higher order functor's argument category (i.e.</Paragraph>
    <Paragraph position="6"> Yo--Z) has subformuiae corresponding to components of both of the other two assumptions, Yo-W and Wo--Z, it is not clear whether we should view the higher order functor as having a dependency relation only to the 'functionally dominant' assumption Yo-W, i.e. with dependencies as in (lla), or to both the assumptions Yo-W and Wo-Z, i.e. with dependencies as perhaps in either (llb) or (llc).</Paragraph>
    <Paragraph position="7"> The compilation approach, however, lacks this problem, since we have only first order formulae, amongst which the dependencies are clear, e.g. as in (12).</Paragraph>
    <Paragraph position="9"> Some preliminaries. We assume that proof assumptions explicitly record 'order of delivery' information, marked by a natural number, and so take the form: n x N Further, we require the ordering to go beyond simple 'order of delivery' in relatively ordering first order assumptions that derive from the same original higher-order formula. (This move simply introduces some extra arbitrary bias as a basis for distinguishing proofs.) It is convenient to have a 'linear' notation for writing proofs. We will write (n/X \[a\]) for an assumption (such as that just shown), and (X Y / Z \[m, n\]) for a combination of subproofs X and Y to give result formula Z by inference \[m, n\].</Paragraph>
    <Paragraph position="11"> where 5 = dep((X Y / Z \[m, n\])) The procedure dep, defined in (13), identifies the dependency relation established by any combination, i.e. for any subproof P = (X Y / Z \[m,n\]), dep(P) returns a triple (i,j,k), where i,j identify the head and dependent assumptions for the combination, and k indicates the argument position of the head assumption that is involved (which has now been inherited to be argument m of the functor of the combination). The procedure dep*, defined in (14), returns the set of dependencies established within a subproof. Note that dep employs the procedures gov (which traces the relevant argument back to its source assumption -- the head) and fun (which finds the functionally dominant assumption within the argument subproof-- the dependent).</Paragraph>
    <Paragraph position="13"> From earlier discussion, it should be clear that an 'incremental analysis' is one in which any dependency to be established is established as soon as possible in terms of the order of delivery of assumptions.</Paragraph>
    <Paragraph position="14"> The relation &lt;&lt; of (17) orders dependencies in terms of which can be established earlier on, i.e. 6 &lt;&lt; 7 if the later-arriving assumption of 6 arrives before the later-arriving assumption of 7- Note however that 6,7 may have the same later arriving assumption (i.e. if this assumption is involved in more than one dependency). In this case, &lt;&lt; arbitrarily gives precedence to the dependency whose two assumptions occur closer together in delivery order.</Paragraph>
    <Paragraph position="16"> min(i, \]1 &gt; rain(x, y))) We can use &lt;&lt; to define an incremental normal form for proofs, i.e. an incremental proof is one that is well-ordered with respect to &lt;&lt; in the sense that every combination (X Y / Z \[m, n\]) within it establishes a dependency 5 which follows under &lt;&lt; every dependency 5' established within the subproofs X and Y it combines, i.e. 5' &lt;&lt; 5 for each 5' 6 dep*(X) tJ dep*(Y). This normal form is useful only if we can show that every proof has an equivalent normal form. For present purposes, we can take two proofs to be equivalent if\] they establish identical sets of dependency relations. 5 (18) trace(/,j, (i/X \[a\])) = j trace(/,j, (X Y / Z \[m,n\])) = (m + k- 1) where i 6 assure(Y)</Paragraph>
    <Paragraph position="18"> We can specify a method such that given a set of dependency relations :D we can construct a corresponding proof. The process works with a set of subproofs 7 ), which are initially just the set of assumptions (i.e. each of the form (n/F \[a\])), and proceeds by combining pairs of subproofs together, until finally just a single proof remains. Each step involves selecting a dependency 5 (5 = (i, j, k)) from /) (setting D := D - {5} for subsequent purposes), removing the subproofs P, Q from 7) which contain the assumptions i,j (respectively), combining P, Q (with P as functor) to give a new subproof R which 5This criterion turns out to be equivalent to one stated in terms of the lambda terms that proofs generate, i.e. two proofs will yield identical sets of dependency relations iff they yield proof terms that are fly-equivalent. This observation should not be surprising, since the set of 'dependency relations' returned for a proof is in essence just a rather unstructured summary of its functional relations.</Paragraph>
    <Paragraph position="19"> is added to 7) (i.e. P := (7) - {P, Q}) u {R}). It is important to get the right value for m in the combination fro, n\] used to combine P, Q, so that the correct argument of the assumption i (as now inherited to the end-type of P) is involved. This value is given by m = trace(i, k, P) (with trace as defined in (18)).</Paragraph>
    <Paragraph position="20"> The process of proof construction is nondeterministic, in the order of selection of dependencies for incorporation, and so a single set of dependences can yield multiple distinct, but equivalent, proofs (as we would expect).</Paragraph>
    <Paragraph position="21"> To build normal form proofs, we only need to limit the order of selection of dependencies using &lt;&lt;, i.e.</Paragraph>
    <Paragraph position="22"> requiring that the minimal element under &lt;&lt; is selected at each stage. Note that this ordering restriction makes the selection process deterministic, from which it follows that normal forms are unique. Putting the above methods together, we have a complete normal form method for proofs of the first-order linear deduction system, i.e. for any proof P, we can extract its dependency relations and use these to construct a unique, maximally incremental, alternative proof -- the normal form of P.</Paragraph>
    <Section position="1" start_page="348" end_page="349" type="sub_section">
      <SectionTitle>
7 Proof Reduction and
Normalisation
</SectionTitle>
      <Paragraph position="0"> The above normalisation approach is somewhat nonstandard. We shall next briefly sketch how normalisation could instead be handled via the standard method of proof reduction. This method involves defining a contraction relation (t&gt;l) between proofs, which is typically stated as a number of contraction rules of the form X t&gt;l Y, where X is termed a redex and Y its contractum. Each rule allows that a proof containing a redex be transformed into one where that occurrence is replaced by its contractum. A proof is in normal form if\] it contains no redexes.</Paragraph>
      <Paragraph position="1"> The contraction relation generates a reduction relation (t&gt;) such that X reduces to Y (X \[&gt; Y) if\] Y is obtained from X by a finite series (possibly zero) of contractions. A term Y is a normal form of X iff Y= is a normal form and X \[&gt; Y.</Paragraph>
      <Paragraph position="2"> We again require the ordering relation &lt;&lt; defined in (17). A redex is any subproof whose final step is a combination of two well-ordered subproofs, which establishes a dependency that undermines well-orderedness. A contraction step modifies the proof to swap this final combination with the final one of an immediate subproof, so that the dependencies the two combinations establish are now appropriately ordered with respect to each other. The possibilities for reordering combination steps divide into four cases, which are shown in Figure 1. This re- null duction system can be shown to exhibit the property (called strong normalisation) that every reduction is finite, from which it follows that every proof has a normal form. 6</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="349" end_page="349" type="metho">
    <SectionTitle>
8 Normal form parsing
</SectionTitle>
    <Paragraph position="0"> The technique of normal form parsing involves ensuring that only normal form proofs are constructed by the parser, avoiding the unnecessary work of building all the non-normal form proofs. At any stage, all subproofs so far constructed are in normal form, and the result of any combination is admitted only provided it is in normal form, otherwise it is discarded. The result of a combination is recognised as non-normal form if it establishes a dependency that is out of order with respect to that of the final combination of at least one of the two subproofs combined (which is an adequate criterion since the subproofs are well-ordered). The procedures defined above can be used to identify these dependencies.</Paragraph>
  </Section>
  <Section position="10" start_page="349" end_page="350" type="metho">
    <SectionTitle>
9 The Degree of Incrementality
</SectionTitle>
    <Paragraph position="0"> Let us next consider the degree of incrementality that the above system allows, and the sense in which 6To prove strong normalisation, it is sufficient to give a metric which assigns to each proof a finite non-negative integer score, and under which every contraction reduces a proof's score by a non-zero amount. The following metric tt can be shown to suffice: (a) for P = (nIX \[a\]), #(P) = 0, (b) for P=(XY / Z \[m,n\]), whose final step establishes a dependency a, #(P) = it(X) + ~u(Y) + D, where D is the number of dependencies 5' such that &lt;&lt; a', which are established in X and Y, i.e. D = \[A\]</Paragraph>
    <Paragraph position="2"> it might be considered maximal. Clearly, the system does not allow full 'word-by-word' incrementality, i.e. where the words that have been delivered at any stage in incremental processing are combined to give a single result formula, with combinations to incorporate each new lexical formula as it arrives/ For example, in incremental processing of Today John sang, the first two words might yield (after compilation) the first-order formulae so-s and np, which will not combine under the rule (8). s Instead, the above system will allow precisely those combinations that establish functional relations that are marked out in lexical type structure (i.e. subcategorisation), which, given the parMlelism of syntax and semantics, corresponds to allowing those combinations that establish semantically relevant functional relations amongst lexical meanings. Thus, we believe the above system to exhibit maximal incrementality in relation to allowing 'semantically contentful' combinations. In dependency terms, the system allows any set of initial formulae to combine to a single result iff they form a connected graph under the dependency relations that obtain amongst them.</Paragraph>
    <Paragraph position="3"> Note that the extent of incrementality allowed by using 'generalised composition' in the compiled first-order system should not be equated with that which 7For an example of a system allowing word-by-word incrementality, see (Milward, 1995).</Paragraph>
    <Paragraph position="4"> SNote that this is not to say that the system is unable to combine these two types, e.g. a combination so--s, np =~ so-(so-np) is derivable, with appropriate compilation. The point rather is that such a combination will typically not happen as a component in a proof of some other overall deduction.</Paragraph>
    <Paragraph position="5">  would be allowed by such a rule in the original (noncompiled) system. We can illustrate this point using the following type combination, which is not an instance of even 'generalised' composition.</Paragraph>
    <Paragraph position="6"> Xo-(Yo-Z), Yo--W =~ Xo-(Wo-Z) Compilation of the higher-order assumption would yield Xo--Y plus Z, of which the first formula can compose with the second assumption Yo-W to give Xo-W, thereby achieving some semantically contentful combination of their associated meanings, which would not be allowed by composition over the original formulae. 9</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML