File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2177_metho.xml
Size: 23,776 bytes
Last Modified: 2025-10-06 14:13:41
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-2177"> <Title>Reverse Queries in DATR*</Title> <Section position="2" start_page="0" end_page="1089" type="metho"> <SectionTitle> 1 The Reverse Query Problem DATR (Evans & Gazdm&quot; 1989@ has l)ecome. Olte of the </SectionTitle> <Paragraph position="0"> iiiosl; widely used fornlatl languages tin' the I'(~l)t'ese.tll;;tt,ion of lexicad infornlat,ion. !)N\['ll ~q)plil:ations ha.re been (h~velol)ed for a wide variety of lmlguages (including English, .lat/mmse , Kikuyu, Arabi(:, l,at,in, and others) ;rod mmly different; subdonudns of le, xical rel)resentat,ion, including inilect,ional morphology, undt~rspecification l)honltlogy, nlm-(:onca.t,enative morphophonology, lexicaI senlanti(:s, and tone systems I.</Paragraph> <Paragraph position="1"> We presutlI)OSe that the reader of the llresenl; paper is \[1992\] for reeelll; I)ATR applicatious in Lhese areas. An informal introducl, ion I,o I)ATR is given in (lazdar \[19!10\]. 'l'he sl.andatd syntax and semantics of I)ATI{ is defined in I,iwms gz (~az(lar \[198!)a, 19891)\]. hul)lementation issues are discussed iu (:libbon & Almua \[1991\], Jenkins \[1990\], aud in Gibbon \[19931 . M,)ser \[I 992a, 1992b, 1992(:, 1992d\] provides interesting insights into the fl~rmal properties of I)N\['I/(see also the I)A'\['I/ represen/ations of finil,e state allLomal.a, dilI'e~ent kiiMs of logics, regisl, er operations ere. in Evans & (l~z(la,' \[1990\], and l,;ml;er \[1993\]). Andry et al. \[19931 describe how I)ATR can lm used in speech-oriented ~tl)l)lieal.ion.~;.</Paragraph> <Paragraph position="2"> qll(*,st,ion~ &lid does it allow lbr a.n explicit t,re;~t,lllonl; of generalisat,ions, subgene, ralisations, ;rod ex--Cel)tions'. ~ * its l'ailg~e of acct',ssing .strategies: are th(w0, &cc0s.qillt, ~ strategies for all apl)lical;ions which 1)rt'.suppose a lexicon (e.g. parsing, general;ion, ...), a.nd tlo t,hey sup porl; t;he development, Ill;:tillt,t!ll~tllt:(}, ;-I, II(\] evahmLion of lexi(:a in an a(h~(ltlat,l~ manner? Most; of t,h(! previous work oil i)A~I'l/ has focussed ou t,hc forlnc'r set, of (:rigeria, i.e. t,he det;larative features of l;he language, its exl)ressive i:~Lpalfili|;ies, mid its a(tequ;ti:y l()r Lhe r(>forinul~l;ion of l)r(>l;h(~oret,i(; int()rln~tl linguistic concepts. This paper is mainly con(:erImd with f;he latter set of criteria of adequacy. However, in the (:ase of I)ATI{, the limited access iu only one dire(:tion lms led to a somewhat l)ro(:edural view of \[;he language whi(:h, in 1)artil:ular cases, has also had a.n impact on the declarative rel)resenl;al;ion,q I;hem,qelves.</Paragraph> <Paragraph position="3"> I)AT\]/. has ofl;en been (:h~r;u:I;erised as a fiim.ctional ttild d(ttg'.l'glti'lti.s't{(: 1}LllglHtglL These fe}Ll;llt'(hq 31'o, ()f COllt'SO, not prolmrl;ies of the bmgm~ge it,self, but rather of I;he la.uguage l;ogether with a particulm: procedural ild;er pre.t,ation. Actually, l;he t,erm deterministic is ill)i; }l,l)t)lic~fl)le to a declarative l~mguage, but only makes s(!ltse if applied to a procedural laalgua.ge or a particuta.r procedural intert)retal;ion of a langnage. The I)ATR in terpreter/couq/iler systems develol)Cd st) t~l '2 have in COmlnon that (,hey supt)orL italy one way of accessing the inli)rmat, ion relIres(mt(~'(1 in & I)ATR theory. 'Fhis access st;ral;egy, whi(:h we will refer to as the sl, anda'rd pT&quot;ocedur'al intcrprctatio'n of \])ATR, closely resembh~s the inference rules defined in Evans & Gaz(lar \[11989a\].</Paragraph> <Paragraph position="4"> Even if one considers DATR neither a.s a tool for i)a.rs ing nor for generatioll tasks, \[)lit, rather as a purely ret/ resent,ational device, the one-way-only access to DATR t,heories turns ollt, to 1)e OllO ()f the major drawbacks of t;he model.</Paragraph> <Paragraph position="5"> One (If (;tie i:bdins stated for DATR in F, wms &. Gaz(l&r \[\] 989\] is t,haA; it is i:onqnttationally l;ra(:l;able, lhlt~ for many practical purpl/Ses, including lexicon iIevelo 1) tnt!llL sl, lld ew~,lual;ion, it, is llOt, sufficient, t,hal; t,her( ~. is ,:~lly 21)ATI;i/ impl-et;mn~ati ..... i,ave I ........ leveloped by iC Evans (I)A'\['I(90), I). (lit)bon (I)I)ATI{, ODE), A. Sikorski (TPI)A'I'll,q), .l. Kilbury (QI)ATII), (I. I)rexel (YAI)\]&quot;), M. I)uda (I IU I~ I)ATII), mid other.s.</Paragraph> <Paragraph position="6"> arbitrary accessing strategy at all, bnt there should be an appropriate way for accessing whatever information that is necessary for the purpose in question. This is a strong motivation for investigating alternative strategies for processing DATR representations. This paper is concerned with the reverse query problem, i.e. the problem how a given DATR value can be mapped onto the queries that evaluate to it. A standard query consists of a node and a path, e.g. Sheep:<orth plur>, an<l evaluates to a sequence, of atoms (value), e.g. sheep. A reverse query, on the other hand, starts with the value, e.g. sheep, and queries the set of node-path pairs which evaluate to it, for instance, Sheep:<orth sing> and Sheep:<orth plur>. Our solution can be be regarded as an inversion of the parsing-as-deduction al)proach of the logic programming tradition, since we treat reverse-query theorem proving as a parsing problem. We adopt a wellknown strategy frora parsing technology: we isolate the context-fi'ee &quot;backbone&quot; of DATR and use a modified chart-parsing algorithm for CF-PSG as a theorem prover for reverse queries.</Paragraph> <Paragraph position="7"> I, br the purposes of the present paper we will introduce a DATR notation that slightly differs fi'om the standard notation given in Evans & Gazdar \[1989\] in the following respects: * the usual DATR abbreviation conventions are spelled out * the global environment of a DATR descriptor is explicitly represented (even if it is uninstantiated) * each node-path pair N:P is associated with the set of extensional suffixes of N:P that are defined within the DATR theory In standard DATR notation, what one might call a non-terminal symbol, is a node-path pair (or an abbreviation for a node-path pair). In our notation a DATR nonterminal symbol is an ordered set \[N, P, (7, N', P'\].</Paragraph> <Paragraph position="8"> N and N ~ are nodes or variables ranging over nodes.</Paragraph> <Paragraph position="9"> P and P' are paths or variables ranging over paths. C is the set of path suffixes of N:P.</Paragraph> <Paragraph position="10"> A DATR terminal symbol of a theory 0 is an atom that has at least one occurence in a sentence in 0 where it is not an attribute, i.e. where it does not occur in a path.</Paragraph> <Paragraph position="11"> The suffix-set w.r.t, a t)refix p and a set of sequences S (written as alp, S)) is the set of the remaining suifixes of strings in S which contain thc prefix p: alp, S) {slp^s ~ S}.</Paragraph> <Paragraph position="12"> Let N:P be the left hand side of a DATR sentence of some DATR theory 0. Let be II the set of pat, hs occurring under node N in 0. The path extension constraint of P w.r.t. N and 0 (written as C(P,N,O), or simply c) is defined as: C(P, N, O) = G(I&quot;, n).</Paragraph> <Paragraph position="13"> Thus, the constraint of a path P is the set of path suffixes extending P of those paths that have P as a prefix. Example: Consider the DATR theory 0:</Paragraph> <Paragraph position="15"> The constraint of <> (w.r.t. N and 0) is {<a>,<a b>}, the constraint of <a> is {< b >}, and the constraint of <a b> is ~.</Paragraph> <Paragraph position="16"> We s W that a sequence S - st . .. s,~ (1 _< n) satisfies a constraint C ill {a: 6 cl.ax = s} - ~ (i.e. a sequence S satisfies a constraint C iff there is no pretix of S in C).</Paragraph> <Paragraph position="17"> Now having defined some basic notions, we can give the rules that map standard DATR notation ont;o our representation: Mapping rules</Paragraph> <Paragraph position="19"> llow these inat)ping principles work can 1)erhaps best he claritied by a larger example. Consider the small DAq'R theory, below, wifich we will use ms an example case throughout this paper: <root plur> == feet.</Paragraph> <Paragraph position="20"> Noun: <orth> --= &quot;<root>&quot; &quot;<affix>&quot; <affix sing> == <affix sing gen> == s <affix plur> == s.</Paragraph> <Paragraph position="21"> The application of the mapping rules to the DATR theory above yields tile following result; (unstantiated variables are indicated by bold letters): The general aim of this (somewhat redundant;) notation is 1;o lint everyl;hing that is needed for drawing infmtrices from a sentence (especially its global enviromnent mM possibly compel;ing clauses al; the same node) into t, he rcpresenl;ation of the. sentxmc(; itself. Similar interhal representations are used in several I)ATII. implelnentations. null</Paragraph> </Section> <Section position="3" start_page="1089" end_page="1091" type="metho"> <SectionTitle> 2 Inference in DATR </SectionTitle> <Paragraph position="0"> Bol;h sl;mMmd inference a.nd reverse query inference can be regarded as COmlflex sul)stil;ul;ion Ol)eral, ions defined for sequences of DATR terminal and iiolt-l;Crtllinal symbols which apply if particular real;thing crit(wia ~rr(: sal;istk!d. In case of DATI{. standa.rd procedural Selnantics, a step of inference is tim substitution of a I)ATt{ IlonternfinM by a sequcnt:e of \])A'FR torminal and non-ternfinal symbols. The matching criterion applies to a givon DAT\]{ query and the left hmld sides of the sentenets of the 1)A'HI, theory, if the LfIS of a I)ATII sentences satisfies the matching criterion, a modified vcrsioIl of the right ha.IM side is sttl)sl.il.lll;ed lbr the LItS. Since the maL(:hing criterion is such l;hat there is at most one sent0.nce in a t)A:.HI theory with a matching I,HS, DATR standard inDrence is determilfistic mM functional. The starting point of DA'FR staiMm:d inference is single nonterminal a.nd tim derivation process terminates if a Se(lUenc('. of I.ernfinals is obl;ailmd (or if there is no IAIS in the theory that sa.l;isfics the matching criterion, in which case the process of inference termitortes with a failure).</Paragraph> <Paragraph position="1"> In terms of DAq'\]I. roverse query t)rocedural semmttics, a step of inti;ren(;e is the. substitution of a subsc;qll{m(:(~ of a given sequence of I)ATR. terminal and non-terminal symt)ols by a. I)ATlt non-ternfinal. Tim matching criterion applies l,o the subsequence and the.</Paragraph> <Paragraph position="2"> right hand sides of the sentences o\[ the DATR theory.</Paragraph> <Paragraph position="3"> If the matching criterion is satisfied, a modifie.d version of the LHS of the I)ATlt sentence is substituted for the m~tching subsequencc. In contrast to I)A'FI/, standard inli!rmm(!, the matching c:riterion is sut:h that there might be several I)AT\]/. senl;encos in a given t;hcory which satisfy il;. DA\[I'II reverse query iM'erence is hence neither flmctional, nor deterministic. Starting poinI; of a reverse query is a sequence of l;(n:lninals (a valll(!). A th',rivati(m (,erminaI;cs, if the substitutions finally yield a singh; nonter\]uinal with identical \]oc, al and global cnvirolmmnt (or if there are no matching sentences in the theory, in which case the dcrivatioil fails).</Paragraph> <Paragraph position="4"> We now define the inaA;(:hing criteria for I)ATR terminal symbols, I)ATI{ nonterminM symbols and sequences of DATft symbols. These matching criteria relate extra> sional lemlnal;a (i.e. already derived tmrtial analyses) to I)ATR definil;ional sentences (i.e. &quot;rules&quot; that may yield a fm'tho, r roduction) w.r.t, a given DATR theory 0.</Paragraph> <Paragraph position="5"> A term.thai symbol t, 'matches another tc.r'minal sy'mbolt 2 ifl' t, - t2. We also say that t, rrtatt'Jte.s t.2 with art arbit~nry suJfi:c and art empty constTnint h, of der to provide compatibility with the definitions tbr nontermimfls, below.</Paragraph> <Paragraph position="6"> 1. A nontcrmi'nal IN, 1'1, C1, N', P'\] matches another nonto.rminal \[N, 12.2, C2, N', Pq with a s~tf.Jirr E a'nd a constraint C2 if (@ H'2 = P~E, &n(l (l)) E s;~|;isfies C1. 2. A nonterminal IN, P~, C1, N', i&quot;\] match.ca anotlmr nont, o.rminal \[N, P.e, C2, N', I&quot;\] with an e.rnpt~/ s'uf/i:c a'ttd a constraint a(.\[~,Cu) if (a) P, = I~AI,:, and (b) E satisfies C~.</Paragraph> <Paragraph position="7"> Example: The non-terminal symbol \[Node, <ab>, {<c d e>},Nf,P\[I matches \[Node,<~ b c d>, ~, N~, l~\] with suffix ,5' = <c d> and constraint ~.</Paragraph> <Paragraph position="8"> l?rom the definitions, giwm abovo., we can derive the matching criterion for sequences: 1. The ernpt!/ sequence matches the empty sequence with a.n empty suffix and constraint V).</Paragraph> <Paragraph position="9"> 2. A non-empty sequence of (terminal and nontcrmilml) symbols s'~ ... s',~ (1 < n) matches another sequen(:e of (terminal and non-terminal) symbols s j ... s,, with suttix E mM constraint C if (a) for ca.all symbol sl (1 < i < n): s{ m~l;cho, s s,. with suffix /3 and constradnt Ci, and (b) C = C~ u (& ... o C..</Paragraph> <Paragraph position="10"> To put it roughly, this definition requires thai: the symbols of the sequences match one another with the sarrte (possibly eml>ty) suffix. Tho. re'suiting constraint of the s('.quence is t, he ration of the constraints of the sylnbols. Example: The string of nontcrminal symbols</Paragraph> <Paragraph position="12"> <d>, <e>}. :~ aThe matching criteria, defined above, do not; cover nont, erminals with evaluable paths, i.e. paths that include (an arbitrary nu tuber of possibly recursively e.mbcdded) nontermimds. The matching cril, erion for nonterminals has to be extended in order to account fl)r sLatemcnts with evaluabh~ paths: l,et, lit! eval(tt, e, 0) a funcLion I;hat maps a sl;ring of I)ATR t, erminal attd nonl, erHlinal symbols (~ = At ... A,, on|;o a string of I)NH/. terminals ~' such that (a) each terminal synfl)ol Ai(I < i < rt) in (~ is mapped onl, o il, self in :~, and (b) each nonU'*minal Aj \[Nj, l}, (5'~, Nj, \[j\](l < j < rl,) in ~ is mapped onto ell(; se, quence, a~... aj' in c~' such t;hat, N'j : l'^j e = aj'.., aj' in 0. ,A, refers to (recur-</Paragraph> </Section> <Section position="4" start_page="1091" end_page="1092" type="metho"> <SectionTitle> 3 The Algorithm </SectionTitle> <Paragraph position="0"> Metaphorically, DATR can be regarded as a formalism that exhibits a context-free backbone 4. In anal-ogy to a eontext-flee phrase structure rule, a DATR sentence has a left hand side that consists of exactly one non-terminal symbol (i.e. a node-path pair) and a right hand side that consists of an arbitrary number of non-terminal and terminal symbols (i.e. DATR atoms). IIl contrast to context-free phrase structure grmmnar, DATR nonterminals are not atomic symhols, but highly structured complex objects. Additionally, DATR difli?rs from CF-PSG in that there is not a unique start symbol but a possibly infinite set of them (i.e. the set of node-path pairs that, taken as the. starting point of a query, yMd a value).</Paragraph> <Paragraph position="1"> Despite these differences, the basic similarity of DATR sentences and CF-PSG rules suggests that, in principle, any parsing algorithm for CF-PSGs couhl be a suitable starting point for constructing a reverse query algorithm for DATR. The algorithm adopted here is a bottom-up chart parser.</Paragraph> <Paragraph position="2"> A chart parser is an abstract machine that performs exactly one action. This action is monotonically adding items to an abstract data-structure called ehart, which might be thought of as a graph with annotated arcs (which are also often referred to as edges) or a matrix.</Paragraph> <Paragraph position="3"> There are basically two diff'erent kinds of items: * inactive items (which represent completed amdyses of substrings of the input string) * active items (which represent incomplete analyses of substrings of the input string) if one thinks of a chm't in terms of a graph structure consisting of vertices connected by arcs, then an item can be defined as a triple (START, END, LABEL), where START and END are vertices connected by an arc labeled with LABEL. Active and inactive items ditfer with respect to the structure of the label, inactive items are labeled with a category representing the analysis of the substring given by the START and END position. An active item is labeled with a category representing the analysis for a substring starting at; START and ending at sorne yet unknown position X (END < X) and a list of categories that still have to sire) DATR path extension (of. Evans & ('azdar 1989a).</Paragraph> <Paragraph position="4"> Notice that e has no index and thus has to be the same tbr all nonterminals Aj. Let X1 IN, 15, Ct, N', P'\] be a nonterminal symbol including an evaluable path PI. Xt matches \[N, P'2, C2, N', P'\] with a suffix /3. and a constraint (L, if (at eval(Pt, 1,/, 0) = 7r, and (b) \[N, real'. ', C~, N', P'\] matches \[N, P'2, C~, N ~,/)q with suffix 15' and constraint C., (according to the matching criteria, defined above).</Paragraph> <Paragraph position="5"> 4The similarity of certain I)ATR sentences and context-free phrase structure rules has first been mmltioned in Giltbon \[1992\].</Paragraph> <Paragraph position="6"> l)e i)roven to he proper analyses of a sequence of connected substrings starting at END and ending at X.</Paragraph> <Paragraph position="7"> For the purpose of processing DATR rather than CI,'-PSGs, each active item is additionally associated with a path sutfix. Thus an active item has the structure: (START,END,CAT0, CATj ... CAT,, SUFFIX) Consider the following examples: the inactive item (0, 1, \[House,<orth sing>,{<gen>},House,P'\]) represents the intbrmation that the substring of the input string consisting of the first symbol is the vahm of the query House:<orth sing> (with arty extensional path suffix, but not gcn) in the global environment that consists of the node House and some still uninstantiated path P'. The active item ((),l,\[Noun, <orth>,0,House,P'\], \[Itouse,<affix>,O,House,P'\],e) represents the information that there is a t)artial analysis for a substring of the input string that starts with the first symbol and ends somewhere to the right. This substring is the value of the query Noun:<orth> within the global environment consisting of the node House and some uninstantiated glohal path P', if there is a substring starting from vertex 1 that turns out to he the value of the query Itouse:< a~ix> in the same global environment .IIousc:P '.</Paragraph> <Paragraph position="8"> The general aim is to get all inactive items la-.</Paragraph> <Paragraph position="9"> heled with a start symbol (i.e. a DATR nonterminal with identical local and global environment) for the whole string which a derivable from the given grammar. There are different strategies to achieve this. The one we have adopted here is hased on a chart-parsing algorithm proposed in Kay \[1980\].</Paragraph> <Paragraph position="10"> Here is a brief description of the. procedures: * parse is the main proeedm:e that scans the inl)ut , increments the pointer to the current chart position, and invokes the other procedures * reduce searches t;he DATR theory for appropriate rules in order to achieve fllrther reductions of inactiw'~ items * add-epsilon applies epsik)n productions * complete combines inactive and active items * add-item adds items to the chart We will now giw'~ a more detailed description of the procedures in a pseudo-code notation (the input arguments of a procedure are given in parentheses after the procedure nainc). Since the only chart-modif)ing ot> ('.ration is carried out as a side effcc.t of the procedure add-item, the,'e are no output wdues, at all.</Paragraph> <Paragraph position="11"> The procedure parse takes as input arguments a vertex that indicates the current chart position (in the initial state this 1)osition is 0) and the suffix of the input string sUu'ting at this position. As long its the re.intoning suItix of tlm inlmt string is n(m-(;mpty, parse calls the procedures add-cpsilon, red'ace, and complete, ilICI'{~IIR!II|;S I,h{! pointer to l;he currellt, ch;41:l; position, and si;m'i,s again with t.he new currelg; vcrLex.</Paragraph> <Paragraph position="12"> procedm'e parsc(VEl{~Fl'3X, S I .. * ,% ) variables: The, l}lO(:<~durc 'reduce Lakes all inactive item as tim inl)tll; a,rgumcnL and s{~;~l{;h{!s l,}lO I)ATll, Llmory for tulcs thai; have a mat(:hinp; le, fl;-c{>ruer <:at<~g(}ry. t,'or ea,(:h such rule f{mn{1, 'rc.d'acc inv{}kes tim lTr{Tc{~<htr{~ add..itcm. The procedure add.item is t;\[1(.' chart-modifying ope.ral;i{Tn. \[L t,akes an a{%ive item its an inlmt argttnw.tit. \[f Lhis acLive i{;em has no 1)ending categories, it L'; regarded as a.n inactiw' item. In this case add-item ins(!rl, s a new (:harl enLry for t;he ilxm~, provided il; is not alr('.ady includ{;d in l;he chart, and calls the procedures reduce ;rod cornplcl.< If tit(: item is an active item, then it; is inserted hfl;o the (:hart;, provided it, is not ah'eady inside.</Paragraph> </Section> <Section position="5" start_page="1092" end_page="1093" type="metho"> <SectionTitle> 4 Cycles </SectionTitle> <Paragraph position="0"> A hard problem ior I)ATR interpr(~ters are c:vclc,% i.e.</Paragraph> <Paragraph position="1"> I)ATI(, statements and sets of I)N.\['I{ statements wlfic, h involve r(;(:ursive detiifitions such thai; standard inference (71 reverse-query illf(!r(',Iic(~ (\[o(;s i1(7|; necess:u'ily Ix'rininate afLer a linite mlmber of steps of iMi,rence. Here While simple cycles have to be considered as semantically ill-formed and thus typically occur as typing errors only, both path lengthening and path shortening cycles occur quite frequently in many DATR representations. Note that path lengthening cycles turn out to be path shortening cycles in the reverse query direction, and vice versa. The DATR inference engine can be prevented from going lost in path-lengthening and path-shortening cycles by a limit on path length. This finite bound on path length can be integrated into our algorithm by modifying the add-item procedure such that only items with a path shorter than the permitted maximum path length are added to the chart.</Paragraph> </Section> <Section position="6" start_page="1093" end_page="1093" type="metho"> <SectionTitle> 5 Complexity </SectionTitle> <Paragraph position="0"> CF-PSG parsing is known to have a cubic complexity w.r.t, the length of the input string. Though it is crucial for our approach that we exploit the CF-backbone of DATR for computing reverse queries, this result is of no significance, here. I)ATR is %1ring-equivalent (Moser 1992d), and ~ISMng-equivalence has also been shown for a proper subset of DATR (Langer 1993).</Paragraph> <Paragraph position="1"> These theoretical results may a priori outrule DATR as an implementation language for large scale real time applications, but not as a develot)ment environment for prototype lexica which can be transformed into efficient task-specific on-line lexica (Andry et al. 1992). With a finite bound on path length our algorithm works, in practice 5, fast enough to be regarded as a usefifl tool for the development of small and medium scale lexica in DATR.</Paragraph> </Section> class="xml-element"></Paper>