File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1080_intro.xml
Size: 8,836 bytes
Last Modified: 2025-10-06 14:00:45
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1080"> <Title>Chart Parsing and Constraint Programming</Title> <Section position="3" start_page="0" end_page="552" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The parsing-as-deduction approach proposed in Pereira and Warren (1983) and exlended in Shieber et al. (1995) and the parsing schemala detincd in Sikkel (1997) are well established parsing paradigms in computalional linguistics. Their main slrengths are Iheir llexibility and lhe level of abstraction concerning control informal\]o,1 inherenl in parsing algorithms, lVurlhermore, lhcy are easily exlcnsible In more complex formalisms, e.g., at\]gmcntcd phrase struclure rules or the ID/LP formal.</Paragraph> <Paragraph position="1"> Constraint Programming (CP) has been used in computational linguislics in several areas, for example in (typed) featut'e-lmsed systems (Smolka, 1995), or condio tional constraints (Matiasek, 1994), or adwmccd compilation techniques (G6tz and Meurcrs, 1997) or specialized constraint solvers (Manandhar, 1994). But none of these approaches uses constraint programming techniques lo implement standard chart parsing algorithnls directly in a constraint system.</Paragraph> <Paragraph position="2"> In this papel; I will bring these two tmmdigms together by showing how to implement algorithn\]s fl'om the parsing-as-deduction sctmme by viewing the parsing process as constraint propagation.</Paragraph> <Paragraph position="3"> The core idea is that the items of a conventional chart parser are constraints on labeled links between the words and positions of an input string. Then tile inference rules allow for the deduction of new constraints, again labeled and spanning parts of tim input siring, via constraint propagation. The resulting constraint store represents the chart which can be accessed to determine whether the parse was successful or to reconstruct a parse tree.</Paragraph> <Paragraph position="4"> While this ntay seem a trivial observation, it is not .iust another way of implementing deductive parsing in yet another language. &quot;file approach allows for a rapid and very flexible but at lhe same time uniform method of implementation of all kinds of parsing algorithms (for constraint-lmsed theories). The goal is not necessarily to build tim fastest imrscr, but rather to build - for an arbitrary algorithm - a parser fast and pcrspict\]ously. For example, the advantage of our approach compared to the one proposed in Shieber et al. (1995) is that we do not have to design a special dedt,ction engine and we do not have In handle chart and agenda explicitly. Furlhemmre, the process can be used in any constrainl-based formalism which allows for constraint propagation and therefore can bc easily integrated into existing applications. The paper proceeds by reviewing Ihc parsing-as-deduction approach and a imrticular way of implementing constraint syslclns, Constraint Handling P, ules (CHR) as presented in l~rfihwir/h (1998). Then it shows how to implement several parsing algorithms very naturally with constraint propagation rules before concluding with an outlook on how 1o exteml the technique In more advanced applications.</Paragraph> <Section position="1" start_page="0" end_page="551" type="sub_section"> <SectionTitle> 1.1 Parsing as Deduction </SectionTitle> <Paragraph position="0"> Although I assume some familiarity with parsing-asdeduction, I will recall some basic delinitions for convenience. The nolations and dm three basic algorithms are directly token from Shieber et al. (t 995).</Paragraph> <Paragraph position="1"> As usual, strings w result from concalcnation o1' symbols from some alphal~et set PS, i.e., w C- E':. We refer t(i tile decomposition of such a siring into its alphabet symbols with indices. We lix this notation using w = Wl... uS,. Further notational conventions are: i, j E N, n for the length of the string to be parsed, A,B,C,... for arbilrary formulas or nonterminals, a,b,c,.., for terminals, a for the empty string and o~,\[3,7,.., for strings of terminals and nonterminals. Formtflas used in parsing will also be called items or edges. A grammatical deduction system or, in Sikkel's terminology a pal:ring schema, is defined as a set of deduction schemes and a set of axioms. These are given with the help of formula schemata which contain (syntactic) me\]a-variables which are ins\]an\]tared with concrete terms on application of tim rules. A deduction scheme R has t11o general form</Paragraph> <Paragraph position="3"/> <Paragraph position="5"> where the Ai and C arc formula schemata. The Ai are called antecedents and C the consequence. Note that deduction schemes may refer to string positions, i.e., the indices within the input string, in their side conditions.</Paragraph> <Paragraph position="6"> Application of these schemata and derivations of formulas are then detined as in the Shieber et al. article. Intuitively, parsing uses the deductive rules - if their antecedents and the side conditions are met - to infer new items from the axioms and already generated items until no new ones can be derived. The parse was successful if a goal item was derived.</Paragraph> <Paragraph position="7"> Therefore, all the parsing systems used in this paper are delincd by specifying a class of items, a set of axioms, a set of inference rules and a subset of the items, the goals, For better readability, I follow Shieber et al. in using the familiar dotted items for the presentation. The three classical example algorithms we will use to illustrate our technique are given in Tab. 1. I assume familiarity with these algorithms.</Paragraph> <Paragraph position="8"> Unless specified differently, we assume that we are given a context-free grammar ~ = ( N, Z, S, P ) with non-terminals N, terminals Z, start symbol S and set o1' productions P. For Earley's algorithm we also assume a new start symbol S' which is not in N. Each production is of the form A ----+ o~ with A G N, c~ E (NU Z)*. For examples I will use the simple PP-attachmcnt grammar ~ given in Fig. 1 with the obvious sets of nonterminals and terminals, the start symbol S and productions P. It is left to the reader to calculate example derivations for the three algorithms t'or a sentence such as John hit the dog with the stick.</Paragraph> </Section> <Section position="2" start_page="551" end_page="552" type="sub_section"> <SectionTitle> 1.2 Constraint Handling Rules </SectionTitle> <Paragraph position="0"> There are several constraint programming environments available. The most recent and maybe the most tlexible is the Constraint Handling Rules (CHR) package included in SICStus Prolog (Friihwirth, 1998). These systems</Paragraph> <Paragraph position="2"> maintain a constraint base or store which is continually monitored for possible rule applications, i.e., whether there is enough information present to successfully use a rule to silnplify constraints or to derive new constraints.</Paragraph> <Paragraph position="3"> Whereas usually one deals with a tixed constraint domain and a specialized solver, CHR is an extension of the Prolog language which allows for the specification of user-defined constraints and arbitrary solvers. The strengfl~ of the CHR approach lies in the fact that it allows for multiple (conjunctively interpreted) heads in rules, that it is flexible and that it is tightly and transparently integrated into the Prolog engine.</Paragraph> <Paragraph position="4"> In CHR constraints are just distinguished sets of (atomic) formulas. CHR allow the definition of rule sets for constraint solving with three types of rules: Firstly simplification rules (<=>) which replace a number of constraints in the store with new constraints; secondly propagation rules (==>) which add new constraints to the store in case a number of constraints is already present; and thirdly &quot;simpagation&quot; rules (<=> in combination with a \ in the head of the rule) which replace only those constraints with new ones which are to the right of the backslash. Rules can have guards. A guard (separated from the rest of the body by a I) is a condition which has to be met before the rule can be applied.</Paragraph> <Paragraph position="5"> We cannot go into the details of the formal semantics of CHR here. The interested reader is referred to Frfihwirth (1998). Since I will refer back to it let us just note that logically, simplification rules are cqt, ivalences and propagation rules are implications if their gtmrd is satislied. Simpagation rules are special cases of simplification rules. Soundness and completeness results for CHR are available (Abdennadher et al., 1996 Abdennadher, 1998).</Paragraph> </Section> </Section> class="xml-element"></Paper>