File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-1079_intro.xml
Size: 8,928 bytes
Last Modified: 2025-10-06 14:05:34
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-1079"> <Title>PRINCIPAR--An Efficient, Broad-coverage, Principle-based Parser</Title> <Section position="2" start_page="0" end_page="482" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Principle-based grammars, such as Government-Binding (GB) theory (Chomsky, 1981; Haegeman, 1991), offer many advantages over rule-based and unification-based grammars, such as the universality of principles and modularity of components in the grammar. Principles are constraints over X-bar structures. Most previous principle-based parsers, e.g., (Dorr, 1991; Font, 1991; Johnson, 1991), essentially generate all possible X-bar structures of a sentence and then use the principles to filter out the illicit ones. The drawback of this approach is the inefficiency due 1;o the large number of candidate structures to be. filtered out. The problem persists even when w~rions techniques such as optimal ordering of principles (Fong, 1991), and corontining (Dorr, 1991; Johnson, 1991) are used. This problem may also account for the fact that these parsers are experimental and have limited coverage.</Paragraph> <Paragraph position="1"> This paper describes an efficient, broadcoverage, principle-based parser, called PRIN-CIPAR. The main innovation in PRINCIPAR is that it applies principles to descriptions o17 X-bar structures rather than the structures themselves. X-bar structures of a sentence are only built when their descriptions have satisfied all the pri ncil)les.</Paragraph> <Paragraph position="2"> CIPAR. Sentence analysis is divided into three steps. The lexical analyser first converts the input sentence into a set of texical items. Then, a message passing algorithm for OB-parsing is used to construct a shared parse forest. Finally, a parse tree retriever is used to enumerate the parse trees.</Paragraph> <Paragraph position="3"> The key idea of the parsing algorithm was presented in (tin, 199:1). This paper presents some implementation details and experimental results.</Paragraph> <Paragraph position="4"> 2. Parsing by Message Passing The parser in PIHNCIPAR is based on a message-passing framework proposed by \],in (1993) and l,in and Ooebel (1993), which uses a network to encode the grammar. The nodes in tile grammar network represent grammatical categories (e.g., NP, Nbar, N) or subcategories, such as V:NP (transitive verbs that take NPs as complements). The links in the network re.present relationships bel;ween the categories. GB-principles are implemented as local constraints attached to the nodes and percolation cormtraints attached to links in the network. Figure'2 depicts ~ port:ion C&quot; tile gr;unmar network for |Dnglish.</Paragraph> <Paragraph position="5"> '2\ I t &quot; &quot;IP cpspe~.. , -/~/\~ i AAI ~ I'P &quot; NI i VI : 1 t.. &quot;.,.... &quot;...&quot;.. ....&quot; &quot;&quot;.... A ul , P ,, ,. ' .......... ': '....&quot;.v.v.v.v.v.v.v; ................. .'&quot;' &quot;&quot; ,~ V:N~ V:(,~x ' adjullct dominance conlplement domln:lnce specialization specifier doininailce head donlinanee barrier Th(;re ~u'e two types of links in 1,he network: subsumption l{nks and dominance links.</Paragraph> <Paragraph position="6"> * \[l.'here is a SlXi)sttln\[)tiOlX link \['rotn (v l;o fl if a subsume.s ft. For exa,ini)le , since V subsumes V:NP and V:CP, l;here is a, sul)smnption link from V to ca.oh ()11o, of them.</Paragraph> <Paragraph position="7"> * There. is a donxhia.nce link frolil node (v i.o /7 if/7 cfl, ll })e imme.dia.tely doininal~ed by O& l.'~Ol ' CXi/dllplc, SillCC a.IX Nl)a.r lii&y iltlmediaPScly dominate a. PP adjimct,, t;here is a dominance link from Nbar to pp.</Paragraph> <Paragraph position="8"> A dominance link fi:om a to fl is a.ssoci~ted with an integer id that determiiles tile linear order between fl and other cat;egories dolnim~t(xl t)y a, and a, binary att;ril)ute to specify whether fl is optional or oblig~l;ory. I t ln order to simplify the diagrain, we did nol. label tile links with their ids in l&quot;igure 2. \[nstead, the precedence between dominance links is ilMie~t>ed l)y their Input sentences a.rc p;u'sed by passing me.ssa.ges iu t,he gramm;u' network. 'l.'he nodes ill the nel, wor\]( are compul, ing agents t;lxi~t comnulnica.t.e wil;h e;~ch oilier 1)y sending messa,ges in tile rcv(HJso direcl, ion of the links ilx the. network. I'\]acll node ha.s a. local nlemory tlxa.t, sDol'es a. set of it;ellx.~. Ail il;em is a triplet thai; represe.nts a. (possibly intern plei, e) X-ba, r strltci>ll I'(? \[t: <str, art, src>,where ~tr is an intx_'ger interva.l \[i,j\] denoi, ing t:ixe i'i~h Lo j'l, tl word ill I, he ill\[)llt; still;el\]eel art is the al;trilml,c vMues of the. reel; node o\[ the X-bar st;rtlCtAll:(':; ~Uid src is i'~ set o\[ St)Ill'CO mess~.~ges Prom which this item is combined. The source i~lessa,ges represent inlinedi~te constituctlLs o\[ the reel; node. li',a.ch node in l, he grannillu: network has a. conll)letion I)redicate tllal, detertllillCS whether a.n ilieln a.t l;lie node. is &quot;coinplete,&quot; ilx wilMi ca.se the it;elXl is sent a.s a, inessltge 1;o el;tier ll()dOS ill 1~110 \]X}VOI'SC direction of the links.</Paragraph> <Paragraph position="9"> ~Vilen a, node receives mi itcnl> il; adiLel31pts {o (:onll)ine the itenl with il;ems \['rein other nodes 1,o forln Hew il;enis. 'l~wo it;ores <\[i,,jl\], A,, S,> a.nd <\[i2,j2\], A,2, S~,> can I)e combilxed if * ' &quot; a,(Ijacent to each \] l, heir Slll'\[a.ce sl, riilgs Arc el, her: i7-: jl-I-1.</Paragraph> <Paragraph position="10"> 2. tiieir a.tl, ribute vMues At mid A~ a.re t lHifli~ble..</Paragraph> <Paragraph position="11"> {{. tile SOtlrc(~ lTxessa,~es COTHe Vii/~ diffe.rent</Paragraph> <Paragraph position="13"> o\[ nlessa.ges, returlis the sel; of links via which the iiicssa.ges a, rrived.</Paragraph> <Paragraph position="14"> {l'he result o\[ I~ixe colnbinM;ion is a. \[leW il;Oll;l: <\[il,.i~\], ,mil'y(A,, A2), S, U S.~>.</Paragraph> <Paragraph position="15"> The new il;em represelxt:s a, la,rger N-ba, r sl;ruct,u re result;i ng from t, hc combination of the two snla.ller cues. 111 1;lie new it;era s<%isfles the loca.l constraint, o\[ I;he node it is considered valid a.nd sa.ved inl;o the local lnOIxlory. ()l:herwise, ig is disca.rded. A valid ito.nl si~t;isfying i;he comsLarting poinl, s, e.g, (J precedes IP under Char since the link leading to (J is to I;he left, of t.he link leading 1,o 1 P.</Paragraph> <Paragraph position="16"> pletion predicate of the node is sent further as messages to other nodes.</Paragraph> <Paragraph position="17"> The input sentence is parsed in the following steps.</Paragraph> <Paragraph position="18"> Step 1: Lexieal Look-up: Retrieve the lexical entries for all the words in the sentence and create a lexical item for each word sense.</Paragraph> <Paragraph position="19"> A lexical item is a triple: <\[i,j\], av~lf, av ..... p>, where \[i,j\] is an interval denoting the position of the word in the sentence; av~lf is the attribute values of the word sense; and av,:o,,,, is the attribute values of the complements of the word sense.</Paragraph> <Paragraph position="20"> Step 2: Message Passing: For each lexieel item <\[i,j\], av~lf, av ..... p>, create an initiM message <\[i,j\], av~r, 0> and send this message to the grammar network node that represents the category or subcategory of the word sense.</Paragraph> <Paragraph position="21"> When the node receives the initial message, it may forward the message to other nodes or it ma,y combine the message with other messages and send the resulting combination to other nodes. This initiates a message passing process which stops when there are no more messages to be passed around. At that point, the initial message for the next lexical item is fed into the network.</Paragraph> <Paragraph position="22"> Step 3: Build a Shared Parse Forest When all lexieal items have been processed, a shared parse forest for the input sentence can be built by tracing the origins of the messages at the highest node (CP or IP), whose str component is the whole sentence. The parse forest consists of the links of the grammar network that are traversed during the tracing process.</Paragraph> <Paragraph position="23"> The structure of the parse forest is similar to (Billot and Long, 1989) and (Tomita, 1986), but extended to include attribute values.</Paragraph> <Paragraph position="24"> The parse trees of the input sentence can be retrieved h'om the parse forest one by one.</Paragraph> <Paragraph position="25"> The next section explains how tile constraints attached to the nodes and links in the network ensure that the parse trees satisfy all the principles. null</Paragraph> </Section> class="xml-element"></Paper>