File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-1016_metho.xml

Size: 19,646 bytes

Last Modified: 2025-10-06 14:12:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-1016">
  <Title>Using Active Constraints to I)arse &amp;quot;&amp;quot;)~'(' ',.:, \[ ,~ k,,&lt;;</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
NX ~ ('~,...,C,,
</SectionTitle>
    <Paragraph position="0"> can be interpreted as the fl~llowmg implication :</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
(,'1 A ... A (.',, D ,S'X
</SectionTitle>
    <Paragraph position="0"> the clausal form of which is : ~Ci V , . * V ~(5',, V ,b'X Because of the ui,iquei~ess of the positive literal, we can interpret a PS-.rule as a Ilorn clause, with a direct translation into I'rolog. Thus, a context-free gr~unlnar, represenled by a set. of PS rule, corresponds to a set of clauses. To verify the grammaticality of a sentence is tluls equivMent to proving the COllSiSteacy of a set of clauses.</Paragraph>
    <Paragraph position="1"> There is, howew,r, a restriction in the analogy hetwee\[l P~-rtlleS &amp;lid claltses : a \[1111', detines all order on ils right-haI.l--side chunelltS, whereas a clause does not. This restriction has important coliseqllenct,s 011 tho generality of the lileChalliSlllS. hldeed, lhe noti(m of order iiivoIvi)s it multilllica tion of the rifles describing a giw~n phrase : we get as zn;nly rules as there are (:onfigural.ious. This is one of the limits of phrase structure gramlnars.</Paragraph>
    <Paragraph position="2"> ll)/l,l' formMism and boolean constraints will alk)w us to sMve this problem. &amp;quot;Ore will obtain a nearly perfect adequacy bet.weeIl I.h~ theoretical iiiode\] aiid its implementation. Within the classi fieation proposed m iF, van.s87\], it will be a strong direct interl)retation of the model.</Paragraph>
    <Paragraph position="3"> A(m{s 131! COLING 92, NAN'II~S, 23 28 ao(rr 1992 8 1 Pit&lt;It. OF COLING 92, NANrES, AU&lt;;. 23-28, 1992</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Constraints and linguistic theory
</SectionTitle>
    <Paragraph position="0"> The basic mechanism of constraint logic programming is the restriction of the search space, or the reduction of the domain-variables. Tiffs goal can be reached differently depending on the active or passive constraint type (ef \[Vanllentenryck89\]). In the classical logic programming framework, the basic technique is that of generate-and-test. Iu this ease, the program generates values for the variables before verifying some of their properties : the search space is reduced a posteriori. On the other hand, in the CLP paradigm, the use of constraints allows the reduction of this space a priori. Moreover, the set of constraints forms a system which incorporates new constraints (luring the process, while the use of simple predicatcs verifying a prop-erty only has a local scope.</Paragraph>
    <Paragraph position="1"> This active/passive distinction can be useful for parsing, especially according to the type of knowledge that is constrained. Active constraints can easily be defined for syntactic structures and their formation. On the other hand, expressing relations between these structures with this kind of constraint is not always possible.</Paragraph>
    <Paragraph position="2"> We will describe the principles governing the formarion of the structures. A syntactic structure can be of two types :</Paragraph>
    <Paragraph position="4"> The formation of complex structures is governed by two types of knowledge : * internal : specific information within a structure null * external : relations between structures Internal knowledge concerns the structure composition, independently of its context. For a phrase, it is the set of its constituents. External knowledge describes interactions between structures. They concern on the one hand the order and on the other hand tile government (in the sense of phrase-structure grammars : selection, agreement ...).</Paragraph>
    <Paragraph position="5"> ID/LP formalism uses such a distinction : it separates information about immediate dominance (i.e. the set of possible constituents of a phrase) from that on linear precedence (i.e. the partial order relation between these constituents).</Paragraph>
    <Paragraph position="6"> It is possible to consider these two types of knowledge as constraints (cf \[Saint-Dizier91\]). But it is important to distinguish their respective funetionings. We will illustrate this point by presenting principles for each type.</Paragraph>
    <Paragraph position="7"> o Internal knowledge Each complex structure must contain at least one particular element called the head. This category gives the phrase its type and its presence is compulsory. The other constituents are usually optional. We must specify that local constraints could require the presence of a particular category, but it is a sub-categorization aspect : it concerns relations between the sub-structures of the complex structure and is not specific to the structure itself. We will see that this distinction between optional and compulsory constituents can be represented directly as an active constraint.</Paragraph>
    <Paragraph position="8"> o External knowledge In the case of ID/LP formalism, the order constraints (i.e. linear precedence) cannot be easily used with an a priori reduction of the search space. Indeed, LP-rules define a partial order upon the set of categories. The LP-aeceptability relation uses this order and can be regarded as a constraint upon the domain-variables. It is a symbolic user- defined constraint. The use of this kind of constraint is possible in Chip (ef \[Dincbas88\]), but not in Prolog III (cf \[ColmerauergO\]).</Paragraph>
    <Paragraph position="9"> tlowever, using this order relation as an actual constraint allowing the reduction of domain-variables is difficult. In so far as it is a partial order, the LP notion cannot be used to predict the categories that can follow a constituent. It is used during the parse to verify the possibility for each new category to appear at a given place in the syntactic structure.</Paragraph>
    <Paragraph position="10"> Generally speaking, internal properties allow an easier use of active constraints than external ones.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Constraints and ID/LP formalism
</SectionTitle>
    <Paragraph position="0"> As we have seen, ID-rules of ID/LP formalism only contain tile set of possible constituents (without any notion of order). Therefore, an ID-rule is strictly equivalent to a clause.</Paragraph>
    <Paragraph position="1"> Example : N P &amp;quot;-'*id Del, N, AP ~ N P V ~De~ V ~N V ~AP This equivalence is the basis of the conciseness and generality properties of GPSG. But it is difficult to represent. As we have seen, logic programming cannot directly represent the non-ordered aspect of a clause. Ilowever, it is possible to represent this kind of information as active constraints. These must allow the expression of tile simple fact that a phrase is well-formed if it is at least composed of the constituents Ct,..., C,. Other relations between the structures (like order or selection) will only be verified if this constraint is satisfied. null ACT~.S DE COLING-92, NANTES, 23-28 AOt~rr 1992 8 2 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 Practically, each rule descrihing a phrase cor: responds to a clause whose literals represent categories. An ID-rule is thus translated into a boolean formula where each category corresponds to a boolean. The semantics of this representatiou is the following : A literal is true if it corresponds to a well-formed structure. A structure is well- formed if it corresponds to a le~cical category (simple structure) or to a well- formed phrase (compler structure).</Paragraph>
    <Paragraph position="2"> Thus, the boolean value of a complex structure is the interpretation of this formula, and so depends on the value of its constituents.</Paragraph>
    <Paragraph position="3">  It is interesting to note that the ID/LP formalism strongly reduces the problem of PS-rules multiplication inherent in phrase-structure grammars, tlowever, as we have seen in tile previous example, there is still a redundancy in the information. Indeed, a set of rules describing a phrase allows us to distinguish between two types of constituents according to their ot)tional or eomtmlsory aspect.</Paragraph>
    <Paragraph position="4"> Hence, for each phrase we can define a minimal set of compulsory constituents (generally limited to the head of the phrase), which we call the minimal set of a phrase.</Paragraph>
    <Paragraph position="5"> Ezample : In the previous example, the minimal set of the NP is {N}.</Paragraph>
    <Paragraph position="6"> We introduce an additional restriction preventing the repetition of an identical category within a phr,~se. This restriction is very strong and has to be relaxed for some categories (such as PP). But it remains a general principle : most of the categories should not be repeated.</Paragraph>
    <Paragraph position="7"> We then construct a principle defining tile well-formedness of complex structures. 't'his principle only concerns internal knowledge : A phrase is well-formed iff it respects the following properties : m it contains at least one head * no constituent is repeated ~, all its embedded phrases are well-formed In the logical paradigm (equivalence between a role and a clause), we say that a literal is true ~ it corresponds to a lexieal category of the parsed sentence or if it correslmnds to a well-formed phrase. This formation rule allows its to simplify the veritication of the grammatieality of a sentence. We simply need to verify the presence of the minimal set of compulsory constituents to indicate the well-formedness of a phrase. The boolean value of the complete structure is then evaluated recursively. If all the intermediate structures are true, the complete structure is also true and corresponds to a gralomatical sentence.</Paragraph>
    <Paragraph position="8"> We will call realization the actual presence of a category in tile syntactic structure corresponding to a sentence. The verification process of the wellfornmdness of a phrase follows these steps  1. verifieatmn of the realizatiou of the minimal set 2. verification of the membershil) of the realized constituents within the minimal set 3. verification of the uniqueness of the constituents in a pllr,'~se 4~ verification of the well4ormedness of embedded phrases  In an active constraint, we replace the set of clauses describing all the possible constructions with a system &lt;)f constraints S defining the set of l)ossihle constituents and the condition of realization for the minitelal set. We can represent it as follow : Let G' he the set of possible constituents of a phrase XP, let X t&gt;e the head of XI', let M be the minimal set such xs M = {X}UC' (where C' C C), and let zX be the disjtmction of the literals of M. The well-formedness constraint is :</Paragraph>
    <Paragraph position="10"> The well-formedness constraint for a Nt' is: {NDNI'} The well:formedness constraint for a PP is : { f'rel&gt; A N P D I' P } ACIES DE COLING-92, NAIqrES, 23-28 ^o~r 1992 8 3 PROC. OF COL1NG-92, NANTES, AUG. 23-28, 1992 It is interesting to note that the implication corresponding to the set of rules describing the NP in the previous example forms a system of constraints that can be simplified to {N D NP}. This prop-erty is verified for all phrases : Given a grammar G, VXP such that Xf' E G, lel A be the disjunction of the liter'Ms of the minimal set of XP, then the formula corresponding ~o the rules describing XP is simplified to {A D XP}.</Paragraph>
    <Paragraph position="11"> We thns have both a linguistic and a h)rmaljnstit|cation of tile active constraint used to verify tile well formedness of a phrase.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Implementation in Prolog III
</SectionTitle>
    <Paragraph position="0"> We will now describe the parsing strategy and its implementation.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Bottom-up filtering
</SectionTitle>
      <Paragraph position="0"> Our parsing strategy relies on tile concept of left boundary of a phrase. It is an improvement of the left-corner strategy (cf \[Rosenkrantz70\]) called bottom-up filtering (ef \[maehe90\]). It consists in using tile information extracted from 1,P constraints to determine all the left-bounds of the phrases from the list of lexieal categories corresponding to a sentence. This process, unlike the left-corner one, relies on a distributional analysis of the categories and the verification of some properties. null We define the following flmctions which allow the initialization of the left boundaries.</Paragraph>
      <Paragraph position="1"> o First-legal daughters (noted I&amp;quot;LD(P)) : this function defines for each phrase P the set of categories that can appear as left boudaries. It is de tined ,as follows ( LP relation between sets is noted</Paragraph>
      <Paragraph position="3"> Let P be a phrase, ga such that f' -~ c~ then FLD, the set of first legal daughters, is defined ,'~s R)llows: m,D(P) = {e E ~ such that e -&lt; ,, - {e} } &lt;, Immediate precedence (noted ll',,(c)) : this fimetlon defines for each FLI) c of a phrase P the set of categories that can precede e in P. It is defined as follows : Let P be a phrase, V(* such that P --/ o, let x be a non-terminal, let c E FLD(P), then IPv(e), the set of immediate precedence of c for P, is defined as follows: IPp(c) = { ..... h that (x -4 c) or (,c E ....... l neither x -&lt; c nor e -&lt; z ea:ist)} o Iu'tialize : this flmction verifies whether a category c is the actual left boundary of a phrase P. It is defined ms follow : Let I be a string, let C be tile list of lexical categories of I, Ve E C, c' G N (set of non4erminal symbols) such that c' precedes c in C ; c initializes S life E FLI)(S) anti e' C/ IPs(e) The syntactic structure of the sentence is built from a list of partially evaluated structures. The process consists in determining all the h.'ft bounds and, from this structure, in completing tire partial structures by an analysis of the other constituents of the phrase. This is done by verifying whether the current category can or cannot belong to the cnrrent phrase.We have at our disposal the set of possible constituents for each phrase, the LP constraints and the other instant|at|on principles of the GPS(\] theory. After these verifications, if tile current category cannot belong to the current phrase, then we have reached the right boundary of the current ptm~se.</Paragraph>
      <Paragraph position="4">  This strategy allows a reduction of the search space. Parsing becomes a simple membership test of a category within a set.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Implementation
</SectionTitle>
      <Paragraph position="0"> The following implementation considers only the ID/LP formalism (instead of the entire GPSG theory). We will not speak here about the other GPSG principles, bnt their insertion in the ID/LP module is very shnple.</Paragraph>
      <Paragraph position="1"> The parsing mechanism consists in assigning the value true l.o the boo\]eans corresponding to the categories a~s and when they appear. If the structure is simple (i.e. a lexical category), the LP-aeeeplability of this category in the phrase is checked and tire corresponding boolean is a.ssigned AC.I'ES DE COLING-92, NANTES, 23-28 AO(;F 1992 8 4 PROC. OV COLING-92, NANTES, At;c;. 23-28, 1992 tile vMue true. In the case where the l)otton&gt; up tiltering detects a left-bound, tile corresponding boolean of tile current category is mssigned tile value true and tile embedded phrase is parsed before coming back to tile construction of tim current phrase. When we reach the right boundary, the well-forme(lness of tim embedded structures is checked (i.e. all the corresponding booleans must be true). If this is tile case, the corresponding boolean value is that of tile disjunction A of tile literals corresponding to the minimal set.</Paragraph>
      <Paragraph position="2"> The representation of tile categories and their associated Iiooleans will be done through two parallel lists which will be examined simultaneously during an affectation (or any other operation).</Paragraph>
      <Paragraph position="3"> A l)hrase is described l)y the set. of its possil)le constituents, t,he set of its optional categories ~uld ~ forlnuls, using its tniniLnal set. '\['lie two sets are represented by lists and the R)rmula is an imldiCa don of the form {A D XP}. This inlbrm~ttion is collected into a systenl of constraints ehar;teterizing each phra.se.</Paragraph>
      <Paragraph position="4"> Here is a simplilied version of our parsing prc~ cess. The following predicates allow the parsing of a Ithrase and its simple or complex constituents.</Paragraph>
      <Paragraph position="5"> It c;m be noted that tile gramnm.tieal knowledge is lmshed at it low level. It is repn:sented by the set of constraints ~ssoeiated to each phrase.</Paragraph>
      <Paragraph position="6"> Moreover, at this level we do not use the notion of sub-eategorizatioil, but only rules concerning the general structure. We grill idSO notice the concisehess of this representation with reg;~rd to eh~ssical phra.se--strueture formalisms.</Paragraph>
      <Paragraph position="7"> Deseril)tion of the. implementation Let G be the following ll)/l,P grammar :  q}lm lbllowing predicates correspond to the heart of the parser for the grammar G :  Tree(&lt;S \[&lt;c&gt;. All &gt;. A2,T) ; APhras e (&lt;c&gt;. i, 11, Cat, Bool, &lt;c&gt;. A) -+ LpAcceptable (c, Cat, Boo\].) lltstallciat e (e ,Cat, Bool) APhrase(l ,it ,Cat,Boo\].,A) ;  Th( APhrase rllh! takes as illpllt ihe list Of partial structures returned by bottum-up filtering. It distinguishes between (we (:~ua.s aceor&lt;ling to the type of the current structure : complex (rule ~1) or simple (rule #2). In the first c~use, the following processes arc eMlcd : (r) veritication of the mend)ership of the current structure within the set of the pnssibb consl.it)lel/ts el the curreltt phrmse (Constituent rule} o verifi&lt;'ation of the l,l ) acceptability (LpAcceptabl e r,lle) ~, parse of the elnbedded COlllplex structure (AnEmbeddedPhrase rule) tmrse of the rest cd&amp;quot; the phr;Lse (APhraee rule) construction and w'rilicatiou of the syntactic tree (Tree rub) In the case of simple structures, afl;er checking tim l,P-aeceptalfility, the correslmndiug boolean is assigned tile value true (Instanciate rule) and tile parse of the current phrase is pursued.</Paragraph>
      <Paragraph position="8"> If the APhrase r,de fails, the right-bound of the phrase is reached and die parse is pursued at a superior level.</Paragraph>
      <Paragraph position="9"> AnEmbeddadPhr as e (&lt;S, c&gt;. l, 11, gag, Bool, A ) -,  rFhe AilFanbeddedPhrase rule allows the parse of &amp; ll(!W COIUptex Btriicttli'e. It begins with tile system of ins{ailing constraints describing this structur~ (Coilstraints rule). TI,e wllidity of the con stituents is clmcked (CorrectConstituents and Valid rtdes) Before rettlrlling the boolean wthic of the parse for this phrg~se (variable S').</Paragraph>
      <Paragraph position="11"> We can notice that in this representation, sub-categorization consists in verifying the boolean values corresponding to the categories concerned.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML