File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/88/c88-2143_abstr.xml

Size: 14,610 bytes

Last Modified: 2025-10-06 13:46:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-2143">
  <Title>Predictive Ccmbinators: A Method for Efficient Processing of Combinator~</Title>
  <Section position="1" start_page="0" end_page="697" type="abstr">
    <SectionTitle>
MASSIVE DIS/~MBIGUATION OF Id%R~E TEA~' COR~K)RA
WITH FLEXIBLE CATEGORIAL G~KMM~R
</SectionTitle>
    <Paragraph position="0"> of t~ de~. ~ of %7~ ~ of ~a~, ~ of %/~ ~~s, is to ~id a ~ for l~l~ical reset. R~s da~ is 8, pote~ntially /_~%~, ~t of ~. ~ is to supply a z~p~tati~ ~, if l~sible, ~lete ~iew of ~n~0or~ (s~r~e I~70) stan~ ~_h. In o~ to ac~ %~ ~ial, an elf icier ~ta~ ~t~ ~ ~lication ~fi-w~ ~ ~i(~. At ~/%is n~t (Febz~ 1988) ~ I~L ~im~s is ~/~ ~t ~tati~ oo~ of t/~ ~_/u l~e; it ~tai~ ~er 45 milli~n ~e/~, ~ 8~, ~ ~ ~.</Paragraph>
    <Paragraph position="1"> IN5 l~c~l ~ta~qse is not an ~ ~ i~If: it is ~t to ~ a ~i for ~ific l~Jects, f~ of ~/~ ~s of due data~ is ix9 fore f/~ r~ ~t for a ~ gestation of ~cti~le~. R~ data~ is ~J~, ~ it is .not ~listic ~ on J~/~xj it wilt ~ it Ix~ib\]e t~ ~c/u a ri~ ~ of ~nfonnation. R~fo~, a ~ tJ~ ~i is %0 ~e all ~/~ we~ ~ t~ data~ av~lable for ~ I~, en t/~ I~ by ~ ef ficie~nt ~ ~x~ul a~plication ~ft~, en ~/~ o~ I~ onrich~ ~/~ ma~xgrial. Aut~tic mozl~hologic~ ~uqalysis I~ r~ ~ c~i~ out m~ ~ l~_~l~s will ~ ~ ~rporat~ J~ut~D %/~ da~l~. Or~ l~el hig~r, we ~ J~nte~ J~% t/~ sizn~c not %/~ of an on-lk~ Imrse~&amp;quot; c~l~o fbr ii~ process of lemmatization an effective C/\[i~i~%/en ~e is ,~x~ss~ as well. %b %~ on ~ ~. ~ was ~ case for i~j ~l~ic~l ar~ll~ , i~ s~tactic ~ is an ~l~tion of a c~ial c~l~l~o ~d~ ~uotion of ~ philo~ ~ i/~ La~ ca~i~ parser we use for ~ ~i~ation ~taotic ~lysls is ~ topic of this pap~l ~ . i. A note on smbi~ty J_n C~tegorial G~r~ F~c/u li~ic ~i or framewoz~ ~~ is oc~r~n~ ~%/u ~ ~lem of ambi~ l~cal ~tever way cC/~ deals with it as far as ~~t/on is ~ ~ whatever neat soluti~ ~ ~ ~ ~t~h, %~ fa~ rem~ixls dqat (i) ~ ~ ~ii not dlsappea~, but (2) the explo6ions it gives rise to will cause (often irreparable) danage %0 (othec~ise) neatly conceived ~tactic l~s~ or ~alyzers. C~t~orial ~'~, ~id~ by ~ (~trici~ of ~ Lexi(x~,  may seem by nature to be t.he first victims of this I~. S(m~ c~tc~i;orialis~ try to ciz~e~t t/~ problems by ~ inherently ~Dtiva~ ~~ en ot/~i~ rigidly defined flexible take a cl~ look at ~ of ~ z~icti(~is c~ ~ ~/~tly, i.e. at ~ of t~ ~ian~ that ~m alcoa r~tur~ly, but ,~ remain unnoticed at first sight ~ . Interesting invariants may act as greedy scissors, p~un~ away ~m~ny of the useless branc~s of the search tree. Catecjorial grammrs encode all syntactic ~tien in the i~o The effect of this s~ra~ on ~ ~ence of ~i~t/es can gat/~ if one ~id t~e an oz~i~ p~'ase  stzuc~c~, 9~am~c and tun~ it into a categorial (~ ~bat happens is that for every cat~ in the PS g~:am~E one gets a set of categories in the Cate~risl grsna~c. C~ %be avarage, the n~ of nsw cate4~ies e@mls the ~ of occurzances of old cat~3Dry in %1%8 PS i~/les. A lexical ele~nt that is not at all aa~iguous as far as s~ntac%J.c cate~oz~ ass+-gm~ant is oc~, in PSG, will a~ost c~Tain/~ beoc~ ambiguous in C~. Still, we claim %/~at effective, i.eo fast, disa,~iguati~, is .p~sible with CGo ~ rationale behind this claim :I.~ that effective disamhiguation does not depend as much c~ the de~PS~e of ambiguity, but first and foz~st on tl~ nature of f/%e dissmbiguation ~l~'Ic~'~do ~lIl~z'e~.~ &amp;quot;6m~big~llty is damaging to ola~Lgical ~a p:c,~Jedures because there are no intrinsic Zr~e~%i~s of the system that can deal with it, aiUnost ~;ho reverse is ~ of cat~jorial v~, full l~r~fit is made of theft defining c/%aracteristics. :In order to appreciate these s~te~nts, the best thing %o do is look at a specifio J~l~tation of this idea.</Paragraph>
    <Paragraph position="2"> 2deg ~f~e LPS:mbek calculus \]i~ this section we would like to present a categorial reduction system which is ~alogous to t2~ ~t0\]icatic~al fragment of propositional logicdeg We ~d.ll present it as a calculus, and will limit o~ese\].ve,~; to the formal description, thus ignorJ/~g semantic interpz~tation (which is not /nm~liately relevant for our ~ at hand).</Paragraph>
    <Paragraph position="3"> Some definitions Let BAPS:CAT be a finite set of at/muic categories and CC~ a finite set of category forndrg connec~:ives. Then CAT (the set of all categories) is t_he induotive closure of BASCAT under (~NN, i.e~ the smallest set such that (i) BASCAT is a subset of CAT, and (ii) if X, Y are manbez~; of CAT and I is a msmber of CDNN, tt~n (xlY) is a ~ of CAT.</Paragraph>
    <Paragraph position="4"> So or~ could take BASCAT to be \[S, N, A, T, P} and C~I~N \[/, \, *} (these az~ called right division, left division and product, respes%ively). Some of the ma~rs of CAT are: {N, (NkS), ((N/N)*T), (S/(P\(N/S))) .... ). A o~\],~ category (xl Y) consiste of thre~ :h~m~diat~, suho~%occ~nts: X and Y, which are tbla,L~el~ catecdories , and %/1e oo~aeo~cive. When the c~3ot~ is '/' or 'V, the complex category is a functor. ~Inactor cate4~ories are associated with incxx~plete expressions: they will form an ~.~ssion of ca~ Y (result) with an expressi~a of category X (arg~nt) ~ . In the case of right division, %h8 argument has to be found to fk~ right of the ~ category, whereas in the case of left division, the argument l~as to be found %0 ths left 5 o 'f1~e produc t o~ive '*' is %0 be interpreted as a c~x~atenation operator, i.e. a prock~ category (X'Y) is to be associated with an expression which is the ooncatenati~ of an expression of category X and an expressi~ of categozy Y in that o~der.</Paragraph>
    <Paragraph position="5"> Reduction rules A specific categorial grammar is characterized I~ the choice of basic c~be/jories and oennectives on the one hand, ~%d (m the set of reductic~ rules (xl %ks other. The system of reduction rules says how categories c~t be ccm~/J~ed to form larger o0nstih~ents. The application rule which cxlabines a funct~r with domain X &amp;quot;and rark3e Y with a suitable a~tm~nt of category X %0 give a Y, is only one of the possible reduction rules.</Paragraph>
    <Paragraph position="6"> I,%stead of t~{ing a set of reduction laws as pr~tltive axioms, we will investigate the categorial re,orion 8yst~n as a calculus, whare the reduction laws can be ccnsidered theore~u~ that follow from a set of axioms and a set of inference rules. Next we will see that the parsip~, of a syntagm is really the same thing, in ot/~r words, attempting a pz%9of for a theorem.</Paragraph>
    <Paragraph position="7"> Sequents Before we define the axioms and inference rules of the calculus, we need %o define the r~3tion of sequant 6 .</Paragraph>
    <Paragraph position="8"> A sequent is a pair (G,D) of finite (possibly \[~ ..... B.\] of categories. For categorial Lsequents, we require G to be non-e~ioty and n=l. For the sequent (G,D) we write G =&gt; D. The sequence G is called the antecedent, D the suocedent. For simplicity square brackets and ccmma's are often left out.</Paragraph>
    <Paragraph position="9">  (I) ~ ~ of L are sequ~ts of t~ fo~n X =&gt; X.</Paragraph>
    <Paragraph position="10"> (2) Inference rules of L: X, Y and Z are  categories, B, T, Q, u, V are sequences of categories, where P, T and Q are ,%on-eai0ty.  \[/R\] T =&gt; Y/X if T,Y =&gt; X \[\R\] T =&gt; Y~X if Y,T =&gt; X E/L\] U,Y/X,%V =&gt; Z ifW =&gt;V and U,X,V =&gt; Z \[\I.\] U,T,Y~X,V =&gt; Z if T =&gt; Y ~d U,X,V =&gt; Z \[*L\] U, XeY,V =&gt; Z if U,X,Y,V =&gt; Z \[*R\] P,Q =&gt; X*~f if P =&gt; X 8nd Q =&gt; Y  ~jether, c~nic~ns and inferenc~ rules define tl~ theorems of a categorial calc~ll~-~0 Suppose we have a sequent S, to fi~ out w~ther it is a t~)rPS~n or not we have to apply several, of f/~e infe~e rules above till hog\]ring but axi(~ ~anain. A~ ~e n~y have noticed, all these rules involve the \].~moval of a ~nnective Jn some category. Let's p~\[caphrase ~/m \[/L\] rule by way of ex~01e. It says: to find OUt w~ther a sequent with s~e fur~cor category Y/X is a theoz~L identify a sequence of categories that follow this category, and see whether Y =&gt; the identified .sequence is a theorem, and what preceded the catego~} + X + what followed the sequence =&gt; old succedent is a theorem.</Paragraph>
    <Paragraph position="11"> In the following ex6m~01e we present a proof with the relevant category printed in bold and the identified sequence underlined.</Paragraph>
    <Paragraph position="12">  a/b, d/(e/(f/a)), d, e, f =&gt; b \[/n\] d =&gt; d \[m~.0M\] a/b, e/(f/a), e, f =&gt; b \[/L\] e =&gt; e \[A~ICM\] a/b, f/a, f =&gt; b \[/n\] f =&gt; f \[m~\] a/b, a =&gt; b \[/n\] a =&gt; a \[A~IOM\] b =&gt; b \[~\[IOM\]  If we could find an effioient augx~natic decision procedur~ %k~t would tell us whether a certain ~/uent is either a theorem or not, then we wo~id have an efficient parser ~s well. The idea being, that the succedent represents s~ething like a sentenoe (the cag~gories of the words that make it up) and the antecedent the S (sentence) category. In t/%e next section we will discuss an implem~ltaticn of the decision procedure.</Paragraph>
    <Paragraph position="13"> 3. The Theorem prover, alias parser An algorithm to prove a theorem, could go as follows.</Paragraph>
    <Paragraph position="14"> Giv~: a sequent ~.LTh n (~tegories: n--I in 8ntecedent, 1 ~\]~ succedent.</Paragraph>
    <Paragraph position="15"> Start at the the first category of the succedent. If this is a functor, pick ihe relevant ix~f~  ~lle tl~It will elJndr~te %he oc~nsctive. If tl~ zn\].e tells you to identify a part of the sequent to one of ~gya sides of the category, then first take this t~ be one category. See wbe~r ~ou can prove t~ r~il.tJng sequent(s) (the sequent(s) in ths if~3rt of th~ inferer~e x\]/le). If %he identification ~ not ~_eld a ~\],t (i.e. it, ~siv61y calling %~*e ~, th~ Px&gt;ttC/~a of c2fi~, ~\]~ms r~mginlng is not reacl~ed), i~oI take two eateouories a~d see if this do~s %/-~ trickdeg (~%tinue aCiding cat6~Z)rJ.es \[nltil ~ou have a p~of or iilez~ a~a ~c) ca te\]orie~ left. Ixt the latter case, notkeh~j i~; i(~t yet, because one (x~ald also have &amp;quot;taken the ,'o~cxx~, or third functcm to start the proof ~C/3_t-ho If Jn %he end there a~'e no ~ ftu~ors left %o start the ellnflrmtion with, then the tt~)~em cm~~t be ~mov~ a~ one 'can even say %hat i% is falsJ.</Paragraph>
    <Paragraph position="16"> Clearly, t~.s procedure might take some time %0 deoide on %he validity of a seq~lent. One might hope that %heor~,~ are proven rapidly, but w~l the sequents are false, a lot of ~ork has to be PSk~. Fortunately enough, there is a si,101e way to prune away some branches of the search tree that are guarar, teed &amp;quot;to lead to faillDre. There is a necessazy formal condition that holds of valid ii~eore~ms which is easy to detect~ If a sequent does not have this formal characteristic, it cmmot be a theo.r~ral Even if %he inputted sequent does have &amp;quot;the z~/ired characteristic, in %he pz~ce~s of proving, there will be a lot of subproofs %bat need not be carried out because they will fail inmnediately. This formal characteri.%~tic or Jnveriant is known as val Benthem's Go,it, or Count for shor~. It counts %/\]e ~ of positive (range) and negative (domain) of a basic categozY= X in an arbitrary category, basic or complex. It may be defined as follows.</Paragraph>
    <Paragraph position="18"> C~a,~ralized fm sequ~ of categoxle~, t~ X -o~at of a sequence, X being a category, is the sum of %he X.-oounts of %he elements in the seqllence.</Paragraph>
    <Paragraph position="19"> count(X, \[Yl ..... Y=.\] ) = ~x~nt(X,Y1 ) + &amp;quot;'deg  %k\] figure o~'t whethem' i~zis J~au~e is a i~un phr ~ase, one ~mld have %x} ~ to hJild a (NP) pa~e ~'e~l for each of these -twelve Ix)ssible cx~,bJnati(~s of ca~&lt;)ry assi~T~Itso UsJn~J t'ho GOt~\]t inwwiant~ l~wever, one 1~ beforelknnd %hat ons ~ only c~ of these o~nblnations (given iY_l hold faos) oL~Id \[x\]6~ib\].y b~ 1~3\].~ as a ~l phrase, ~.t &amp;quot;Chat pazsJ\]~J, itself b~l~s superPS1u~ in this ca~ ~i~ fbllowing fic~l~ shows &amp;quot;tha Cx:~It values fo~ t/~a cx\]r~eot ~ssiq~,t~it.</Paragraph>
    <Paragraph position="21"> N~. \[o ~ o 0 o\] '.|.'ho ~:e~r can vem'ify for hia~elf that *\lie of %1~ other: rxm~Dinations satifies the count invariant. it is (k~J.~ that the pmx~edure ,just presented is a ~rfe~t n~m~ to lay hands on &amp;quot;the ratios of the f~:3qu~ieJios of lexica\].ly ambiglKx~ ~rds, given a ~x)~l~us ~In(\] a lexicon with c~tegorial iufonnation. L',o, J~ c)xfi6z. #so dex'ive thes~ figiL~ss for the words h* %l~ t~EK database, sentences of the INL corpus a~ Ir~utg~d ~n a c~ada of diesmbiguath~ i,~x~leso The implementation of this Lambek@~x-tzen di~bi\[~ator is straightfozward as it / ~volv~.~ only s.il~ple ~atchlngs PS~%d list,~anIl~lati~r~s. The m01e of the dis~iguator /\]~ the V~c~s of ~J s~iguating the L corpus can t~ off frcl, the f'ollow\]ng figUmSo  categories of all the words it contains are looked tip .in a parsing lexicon derived from the 16Dcical database. When al\]. combinations of caSegories have been computed, each is tested by the Oount module %0 ~guce %lie number of possible co~Jnatic~s of initial category &amp;ssignments. In the most su~ful case, this reductifxz produces ~ily c~9 possible oc~inatlon, Jmplylng %hat all lexical material in this sexztence is disambigua%ed. In most otl~gr cases, tllly a saall ~tago of the om'iginal n~Mae~ of possible ocm~inations of le~ical assignments is left over; these aro handed over to the Gentzen Proof Machine which wlil find out which of %1~ ~emain/ng assignments fail to oc~b~lne to a 9z'am, Tstical sentence.</Paragraph>
    <Section position="1" start_page="697" end_page="697" type="sub_section">
      <SectionTitle>
Notes
</SectionTitle>
      <Paragraph position="0"> I. Much of the work described here is based on research by Michael M0ortgat. See e.g. his (1987a, 1987b, 1988).</Paragraph>
      <Paragraph position="1">  2. e.g. Wittenburg (1987), Steedman (1987). 3. Instead of theorems deducible frcm the calculus they are often facts that can be proven of the calculus as such, outside the calculus (n~tatheorems in other words).</Paragraph>
      <Paragraph position="2"> 4. This c~%ation is called applicaticn. 5. Notice that we will use the (argu~nt connective result) r~&gt;taticn, no n~&amp;quot; wt~t ~e directionality of the functor.</Paragraph>
      <Paragraph position="3"> 6. We wili present the sequent calculus, which  Lambek adapted from Gentzen's work on logic. See Lambek (1958).</Paragraph>
      <Paragraph position="4"> 7. Because of space limitatic~s ~ will not attempt to show the validity of this procedure.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML