File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/w98-0509_metho.xml

Size: 33,424 bytes

Last Modified: 2025-10-06 14:15:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0509">
  <Title>Decision Procedures for Dependency Parsing Using Graded Constraints</Title>
  <Section position="3" start_page="0" end_page="79" type="metho">
    <SectionTitle>
2 Ellmlnative Parsing
</SectionTitle>
    <Paragraph position="0"> The idea of eliminative parsing is not a novel one and virtually every tagger can be considered a candidate elimination procedure which removes items from the maximum set of tags according to different decision criteria. Interestingly, dependency-based parsing can be viewed as a  generalized tagging procedure. One of the first parsing systems which built on this property is the Constraint Grammar approach (Karlsson et al., 1995). Underspecified dependency structures are represented as syntactic tags 1 and disambiguated by a set of constraints that exclude inappropriate readings. Maruyama (1990) first tried to extend the idea to allow the treatment of complete dependency structures. Therefore, he has to generalize the notion of a&amp;quot;tag&amp;quot; to pairs consisting of a label and the identifier of the dominating node, i. e., the tagset needs to become sensitive to the individual tokens of the utterance under consideration sacrificing the status of the tagset being fixed a-priori. As in the case of atomic tags, constraints are specified which delete inappropriate dependency relations from the initial space of possibilities. The approach is not restricted to linear input strings but can also treat lattices of input tokens, which allows to accommodate lexical ambiguity as well as recognition uncertainty in speech understanding applications (Harper et al., 1994).</Paragraph>
    <Paragraph position="1"> Obviously, it is again the relational nature of dependency models which provides for the applicability of candidate elimination procedures.</Paragraph>
    <Paragraph position="2"> Since the initial state of the analysis is given by an - admittedly large - set of possible dependency relations per token, the problem space remains finite for finite utterances. An analogous approach for constituency-based grammar models would encounter considerable difficulties, because the number and the kind of non-terminal nodes which need to be included in the tagset remains completely unclear prior to the parsing itself.</Paragraph>
    <Paragraph position="3"> Eliminative approaches to parsing come along with a number of interesting properties which make them particularly attractive as computational models for language comprehension.</Paragraph>
    <Paragraph position="4"> 1. As long as constraint checking is restricted to strictly local configurations of dependency relations the decision procedures inherits this locality property and thus exhibits a considerable potential for concurrent implementation (Helzerman and lIn this framework tags denote, for instance, the sub-ject of the sentence, a determlner modifying a noun to the right, a preposition modifying a noun to the lei~ etc. However, only the category of the dominating node is specified, not its exact identity.</Paragraph>
    <Paragraph position="5">  Harper, 1992).</Paragraph>
    <Paragraph position="6"> 2. Since partial structural descriptions are available concurrently they can be compared in a competitive manner. Note however that such a comparison imposes additional synchronization and communication requirements on parallel realizations.</Paragraph>
    <Paragraph position="7"> 3. As the elimlnative approach considers parsing a procedure of disambiguation, the quality of the results to be expected becomes directly related to the amount of effort one is prepared to spend. This is a clear contrast to constructive methods which, upon request usually will attempt to generate alternative interpretations, thus leading to a corresponding decrease of clarity about the structural properties of the input utterance (in terms of Karlsson et al.</Paragraph>
    <Paragraph position="8"> (1995)).</Paragraph>
    <Paragraph position="9"> 4. The progress of disambiguation can easily be assessed by constantly monitoring the size of value sets. Moreover, under certain conditions the amount of remaining effort for obtaining a completely disambiguated solution can be estimated. This appears to be an important characteristic for the development of anytime procedures, which are able to adapt their behavior with respect to external resource limitations (Menzel, 1994; Menzel, 1998).</Paragraph>
  </Section>
  <Section position="4" start_page="79" end_page="80" type="metho">
    <SectionTitle>
3 Graded Constraints
</SectionTitle>
    <Paragraph position="0"> Both the comparison of competitive structural hypotheses as well as the adaptation to resource limitations require to generalize the approach by allowing constraints of different strength. While traditional constraints only make binary decisions about the well-formedness of a configuration the strength of a constraint additionally refiects a human judgment of how critical a violation of that particular constraint is considered.</Paragraph>
    <Paragraph position="1"> Such a grading, expressed as a penalty factor, allows to model a number of observations which are quite common to linguistic structures: * Many phenomena can more easily be described as preferences rather than strict regularities. Among them are structural conditions about attachment positions or linear ordering as well as selectional restrictions. null</Paragraph>
    <Paragraph position="3"> Preferences usually reflect different frequencies of use and in certain cases can be extracted from large collections of sample data.</Paragraph>
    <Paragraph position="4"> Some linguistic cues are inherently uncertain (e. g., prosodic markers), and therefore resist a description by means of crisp rule sets.</Paragraph>
    <Paragraph position="5"> By introducing graded constraints the parsing problem becomes an optimiT.ation problem aiming at a solution which violates constraints that are as few and as weak as possible. This, on the one hand, leads to a higher degree of structural disambiguation since different solution candidates now may receive a different score due to preference constraints. Usually, a complete disambiguation is achieved provided that enough preferential knowledge is encoded by means of constraints. Remaining ambiguity which cannot be constrained further is one of the major ditticulties for systems using crisp constraints (Harper et al., 1995). On the other hand, weighed constraints allow to handle contradictory evidence which is typical for cases of ill-formed input. Additionally, the gradings are expected to provide a basis for the realization of time adaptive behavior.</Paragraph>
    <Paragraph position="6"> One of the most important advantages which can be attributed to the use of graded constraints is their ability to provide the mapping between different levels in a multi-level representation, where many instances of preferential relationships can be found. This separation of structural representations facilitates a clear modularization of the constraint grammar although constraints are applied to a single computational space. In particular, the propagation of gradings between representational levels supports a mutual compensation of information deficits (e. g., a syntactic disambiguation can be achieved by means of semantic support) and even cross-level conflicts can be arbitrated (e. g., a syntactic preference might be inconsistent with a selectional restriction).</Paragraph>
    <Paragraph position="7"> Combining candidate elimination techniques, graded constraints, and multi-level disambiguation within a single computational paradigm aims first of all at an increased level of robustness of the resulting parsing procedure (Menzel and Schr~der, 1998). Robustness is enhanced by  three different contributions: 1. The use of graded constraints makes constraint violations acceptable. In a certain sense, the resulting behavior can be considered a kind of constraint retraction which is guided by the individual gradings of violated constraints. Therefore, a &amp;quot;blind&amp;quot; weakening of the constraint system is avoided and hints for a controlled application are preserved.</Paragraph>
    <Paragraph position="8"> 2. The propagation of evidence among multiple representational levels exploits the redundancy of the grammar model about different aspects of language use in order to compensate the loss of constraining information due to constraint retraction. Naturally, the use of additional representational levels also means an expansion of the search space, but this undesired effect can be dealt with because once a single point of relative certainty has been found on an arbitrary level, the system can use it as an anchor point from which constraining information is propagated to the other levels. For instance, if selectional restrictions provide enough evidence for a particular solution, an ambiguous case can be resolved. Even contradictory indications can be treated in that manner. In such a case conflict resolution is obtained according to the particular strength of evidence resulting from the observed constraint violations.</Paragraph>
    <Paragraph position="9"> 3. The seamless integration of partial parsing is achieved by allowing arbitrary categories (not just finite verbs) to serve as the top node of a dependency tree. Of course, these configurations need to be penalized appropriately in order to restrict their selection to those cases where no alternative interpretations remain. Note that under this approach partial parsing is not introduced by me~n.~ of an additional mechanism but falls out as a general result of the underlying parsing procedure.</Paragraph>
    <Paragraph position="10"> Certainly, all the desired advantages mentioned above become noticeable only if a constraint modeling of grammatical relations can be provided which obeys the rather restrictive locality conditions and efficient implementations of the disambiguation procedure become available. null</Paragraph>
  </Section>
  <Section position="5" start_page="80" end_page="80" type="metho">
    <SectionTitle>
4 Parsing As Constroint Satisfaction
</SectionTitle>
    <Paragraph position="0"> Parsing of natural language sentences can be considered a constraint satisfaction problem if one manages to specify exactly what the constraint variables should be, how constraints can be used to find appropriate value assignments, and how these value assignments represent specific structural solutions to the parsing problem. These are the questions we address in this section. null The original definition of constraint dependency grammars by Maruyama (1990) is extended to graded constraint dependency grammars which are represented by a tuple (~, L, C, ~).. The lexicon !~ is a set of word forms each of which has some lexical information associated with it. The set of representational levels L = {(/x, Lx),..., (l,,L,)} consists of pairs (li, Li) where li is a name of the ith representational level and l~ E Li is the jth appropriate label for level l,. Think of (5yn, {subj, obj, det}) as a simple example of a representational level.</Paragraph>
    <Paragraph position="1"> The constraints from the set C can be divided into disjunct subsets C* with C = Ui Ci depending on the constraints' arity i which denotes the number of constraint variables related by the constraint. Mainly due to computational reasons, but also in order to keep the scope of constraints strictly local, at most binary constraints, i. e., constraints with arity not larger then two, are considered: C = C 1 U C2. 2 The assessment function ~ : C ~ \[0, 1\] maps a constraint c E C to a weight ~b(c) which indicates how serious one considers a violation of that constraint. Crisp constraints which may not be violated at all, i. e., they correspond to traditional constraints, have a penalty factor of zero (~(c) = 0) while others have higher grades (i. e., 0 &lt; ~b(e) &lt; 1) and thus may be violated by a solution. 3 =The restriction to at most binary constraints does not decrease the theoretical expressiveness of the formalism but has some practical consequences for the grammar writer as he/she occasionally has to adopt rather artificial constructs for the description of some linguistic phenomena (Menzel and SchrSder, 1998).</Paragraph>
    <Paragraph position="2"> 3Constraints c with ~b(c) = 1.0 are totally ineffective as will become clear in the next paragraphs.</Paragraph>
    <Paragraph position="3"> Given a natural language sentence W = (wl,... ,win) and a graded constraint dependency grammar the parsing problem can be stated as follows: For each representational level Ii and each word of the sentence wj a constraint variable ~ is established. Let the set of all constraint variables be V. The domain dom(~) = Li x {0,1,..., j- 1, j-I- 1,... n} of variable ~, i. e., the set of possible values for that variable, consists of all pairs (l,/=) where ! is an appropriate label for level li (i. e., l E Li) and/C/ is the index of the dominating word wk (i. e., word wj is subordinated to word wh) or zero if the word wj is the root of the dependency structure on level li.</Paragraph>
    <Paragraph position="4"> A problem candidate p of the parsing problem is a unique value assignment to each of the constraint variables. In other words, for each variable ~ a single value p(~) = d~ E dom(~) has to be chosen.</Paragraph>
    <Paragraph position="5"> The solution is the problem candidate that violates less and/or less important constraints than any other problem candidate. In order to make this intuitive notion more formal the function ~ is extended to assess not only constraints but also problem candidates p.</Paragraph>
    <Paragraph position="6"> = R TIII a cEC o ~EV = where a, 1 &lt; a &lt; 2, is the arity and ~ is a tuple of variables A single constraint c can be violated once, more than once or not at all by a problem candidate since constraints judge local configurations, not complete problem candidates.</Paragraph>
    <Paragraph position="7"> ~(c, ~ = { ~b(C)l.0 :: ifelsedViolates c where dis a (unary or binary) tupie of values Note that satisfying a constraint does not change the grade of the problem candidate because of the multiplicative nature of the assessing function.</Paragraph>
    <Paragraph position="8"> The final solution Ps is found by maximum selection.</Paragraph>
    <Paragraph position="10"> Thus the system uniquely determines the dominating node for each of the input word forms. Additional conditions for well-formed structural representations like projectivity or the absence of cyclic dependency relations must be taken extra care of.</Paragraph>
    <Paragraph position="11"> In our current implementation the acyclico it), property is ensured by a special built-in control structure, while projectivity has to be established by means of specifically designed constraints. This enables the grammar writer to carefully model the conditions under which non-projective dependency structures may occur. Note, however, that there are cases of non-projective structures that cannot be eliminated by using only local (i. e. at most binary) constraints. null Another problem arises from the fact that constraints are universally quantified and existence conditions (like &amp;quot;there must be a subject for each finite verb&amp;quot;) c~nnot be expressed directly. This diilqculty, however, is easily overcome by the introduction of &amp;quot;reverse&amp;quot; dependencies on additional auxiliary levels, which are used to model the valency requirements of a dominating node. Since each valency to be saturated requires an auxiliary level, the overall number of levels in a multi-level representation may easily grow to more than some ten.</Paragraph>
    <Paragraph position="12"> Moreover, the formal description given so far is only valid for linear strings but not for word graphs. An extension to the treatment of word graphs requires the modification of the notion of a problem candidate. While in the case of linear input an assignment of exactly one value to each variable represents a possible structure, this is not valid for word graphs. Instead, only those variables that correspond to a word hypothesis on one particular path through the word graph must receive a unique value while all other variables must be assigned no value at all. This additional path condition usually is also not encoded as normal grammar constraints but must be guaranteed by the control mechanism.</Paragraph>
  </Section>
  <Section position="6" start_page="80" end_page="82" type="metho">
    <SectionTitle>
5 An Example
</SectionTitle>
    <Paragraph position="0"> To illustrate the formalization we now go through an example. To avoid unnecessary details we exclude the treatment of auxiliary levels from our discussion, thus restricting ourselves to the modeling of valency possibilities and ab- null stracting from valency necessities. The problem is simplified further by selecting an extremely limited set of dependency labels. Consider again the example from Figure 1: (I) Die Knochenpl siehtsg die Katzesg.</Paragraph>
    <Paragraph position="1"> The bones sees the cat.</Paragraph>
    <Paragraph position="2"> &amp;quot;The cat sees the bones.&amp;quot; Two representational levels, one for syntactic functions and one for semantic case-fillers, are introduced:</Paragraph>
    <Paragraph position="4"> (Sam, {agent, theme, def}) } Figure 2 contains some of the constraints necessary to parse the example sentence. Basically, a constraint consists of a logical formula which is parameterized by variables (in our example X and Y) which can be bound to an edge in the dependency tree. It is associated with a name (e. g., SubjNumber) and a class (e. g., Subj) for identification and modularization purposes respectively. The constraint score is given just before the actual formula. Selector functions are provided which facilitate access to the label of an edge (e. g., X.labe\[) and to lexical properties of the dominating node (e. g., X1&amp;quot;num) and the dominated one (e. g., X~num). Being universally quantified, a typical constraint takes the form of an implication with the premise describing the conditions for its application. Accordingly, the constraint SubjNumber of Figure 2 reads as follows: For each subject (X.\[abel=subj) it holds that the dominated and the dominating nodes agree with each other in regard to n-tuber (X,i.num=X'tnum).</Paragraph>
    <Paragraph position="5"> Figure 1 from the introduction graphically presents the desired solution structure which is repeated as a constraint variable assignment in Figure 3. 4 All (shown) constraints are satisfied by the variable assignment except SubjOrder which is violated once, viz. by the assignment _V~y n = (suSj, 3). Therefore, the structure has a score  the disambiguation of the example sentence.</Paragraph>
    <Paragraph position="6"> equal to the constraint's score, namely 0.9. Furthermore there is no structure which has a better assessment. The next example is similar to the last, except that the finite verb appears in plural form now.</Paragraph>
    <Paragraph position="7"> (2) Die Knochenpl sehenpl die Katzesg.</Paragraph>
    <Paragraph position="8"> The bones see the cat.</Paragraph>
    <Paragraph position="9"> &amp;quot;The bones see the cat.&amp;quot; A solution structure analogous to the one discussed above would have a score of 0.09 because not only the constraint SubjOrder but also the constraint SubjNumber would have been violated. But the alternative structure where the</Paragraph>
    <Paragraph position="11"> responding to the dependency trees in Figure 1 subj/agent and the obj/theme edges are interchanged (meaning that the bones do the seeing) has a better score of 0.8 this time because it only violates the constraint SemType. This result obviously resembles performance of human beings who first of all note the semantic oddness of the example (2) before trying to repair the syntactic deviations when reading this sentence in isolation. null Thus, the approach successfully arbitrates between conflicting information from different levels, using the constraint scores to determine which of the problem candidates is chosen as the final solution.</Paragraph>
  </Section>
  <Section position="7" start_page="82" end_page="85" type="metho">
    <SectionTitle>
6 Constraint Satisfaction Procedures
</SectionTitle>
    <Paragraph position="0"> A lot of research has been carried out in the field of algorithm~ for constraint satisfaction problems (Meseguer, 1989; Kumar, 1992) and constraint optimization problems (Tsang, 1993).</Paragraph>
    <Paragraph position="1"> Although CSPs are NP--complete problems in general and, therefore, one cannot expect a better than exponential complexity in the worst case, a lot of methods have been developed to allow for a reasonable complexity in most practical cases. Some heuristic methods, for instance, try to arrive at a solution more efficiently at the expense of giving up the property of correctness, i. e., they find the globally best solution in most cases while they are not guaranteed to do so in all cases.</Paragraph>
    <Paragraph position="2"> This allows to influence the temporal characteristics of the parsing procedure, a possibility which seems especially important in interactive applications: If the system has to deliver a reasonable solution within a specific time interval a dynamic scheduling of computational resources depending on the remaining ambiguity and available time is necessary. While different kinds of search are more suitable with regard to the correctness property, local pruning strategies lend themselves to resource adaptive procedures.</Paragraph>
    <Section position="1" start_page="82" end_page="83" type="sub_section">
      <SectionTitle>
6.1 Consistency-Based Methods
</SectionTitle>
      <Paragraph position="0"> As long as only crisp constraints are considered, procedures based on local consistency, particularly arc consistency can be used (Maruyama, 1990; Harper et al., 1995). These methods try to delete values from the domain of constraint variables by considering only local information and have a polynomial worst case complexity.</Paragraph>
      <Paragraph position="1">  Unfortunately, they possibly stop deleting values before a unique solution has been found. In such a case, even if arc consistency has been established one cannot be sure whether the problem has zero, one, or more than one solution because alternative value assignments may be locally consistent, but globally mutually incompatible. Consequently, in order to find actual solutions an additional search has to be carried out for which, however, the search space is considerably reduced already.</Paragraph>
    </Section>
    <Section position="2" start_page="83" end_page="83" type="sub_section">
      <SectionTitle>
6.2 Search
</SectionTitle>
      <Paragraph position="0"> The most straightforward method for constraint parsing is a simple search procedure where the constraint variables are successively bound to values and these value assignments are tested for consistency. In case of an inconsistency alternative values are tried until a solution is found or the set of possible values is exhausted. The basic search algorithm is Branch &amp; Bound which exploits the fact that the score of every subset of variable assignments is already an upper bound of the final score. Additional constraint violations only make the score worse, bemuse the scores of constraints do not exceed a value of one. Therefore, large parts of the search space can be abandoned as soon as the score becomes too low. To further ~ improve the efficiency an agenda is used to sort the search space nodes so that the most promising candidates are tried first. By not allowing the agenda to grow larger than a specified size, one can exclude search states with low scores from further consideration. Note that correctness cannot be guaranteed in that case anymore. Figure 4 presents the algorithm in pseudo code notation.</Paragraph>
      <Paragraph position="1"> Unfortunately, the time requirements of the search algorithms are almost unpredictable since an intermediate state of computation does not give a reliable estimation of the effort that remains to be done.</Paragraph>
    </Section>
    <Section position="3" start_page="83" end_page="85" type="sub_section">
      <SectionTitle>
6.3 Pruning
</SectionTitle>
      <Paragraph position="0"> As explained in Section 6.1 consistency-based procedures use local information to delete values from the domain of variables. While these methods only do so if the local information suffices to guarantee that the value under consideration can safely be deleted, pruning goes one step further. Values are successively selected for deletion based on a heuristic (i. e., possibly incorprocedure ConstraintSearch</Paragraph>
      <Paragraph position="2"> while a ~ 0 do ; process agenda get best item (a, V, s) from agenda a ; best first if V = 0 then if a = b then</Paragraph>
      <Paragraph position="4"> compute new score s ~ for B ~ if s ~ ~ b then add (B', V', s') to agenda fl done truncate agenda a (if desired) done ; complete assignment?  ; best so far? ; equally good ; better ; try next free variable ; try all values ; already worse?  ing: Best-first branch &amp; bound algorithm with limited agenda length (beam search) rect) assessment until a single solution remains (cf. Figure 5). The selection function considers only local information (as do the consistency-based methods) for efficiency reasons. Taking into account global optimality criteria would not help at all since then the selection would be as difficult as the whole problem, i. e., one would have to expect an exponential worst-case complexity. null procedure pruning(V) while 3(~fi V): Idom(~)~ &gt; I do select (~, d) to be deleted delete d from domain  selects and deletes values from the domain of variables.</Paragraph>
      <Paragraph position="5"> Obviously, the selection heuristics plays the major role while pruning.</Paragraph>
      <Paragraph position="6"> Simple selection functions only consider the minimum support a value gets from another variable (Menzel, 1994). They combine the mutual compatibilities of the value under consideration and all possible values for another vari- null able. Then the minimum support for each value is determined and finally the value with the least support is selected for deletion. In other words, the value for which at least one variable's values have only low or no support is ruled out as a possible solution.</Paragraph>
      <Paragraph position="7"> Formally the following formulas (using the notation of Section 4) determine the value d of variable # to deleted next:</Paragraph>
      <Paragraph position="9"> where score(d, d') is the accumulated assessment of the pair of subordinations (d, d ~):</Paragraph>
      <Paragraph position="11"> While this heuristics works quite well for linear strings it fails if one switches to word graphs.</Paragraph>
      <Paragraph position="12"> Figure 6 gives an example of a very simple word graph which cannot be handled correctly by the simple heuristics.</Paragraph>
      <Paragraph position="13"> i laughs start The children ! ~ stop  Alternative word hypotheses whose time spans are not disjunct do not support each other by definition. Therefore, the subordination of children under laugh in Figure 6 is equally disfavored by laughs as is the subordination of children under laughs by laugh. Unfortunately, this lack of support is not based on negative evidence but on the simple fact that laugh and laughs are temporal alternatives and may, thus, not be existent in a solution simultaneously. Since the simple heuristics does not know anything about this distinction it may arbitrarily select the wrong value for deletion.</Paragraph>
      <Paragraph position="14"> A naive extension to the above heuristics would be to base the assessment not on the minimal support from all variables but on the corn- null bined support from those variables that share at least one path through the word graph with the variable under consideration. But the path criterion is computational\]y expensive to compute and, therefore, needs to be approximated during pruning. Instead of considering all possible paths through the graph, we compute the maximum support at each time point t and on each level I and select the minimum of these values to be removed from the space of hypotheses:</Paragraph>
      <Paragraph position="16"> where time(v) denotes the time interval of the word hypothesis (cf. Figure 6 or 7) that corresponds to the variable v and level(v) denotes the representational level of variable v.</Paragraph>
      <Paragraph position="17"> For temporally overlapping nodes the procedure selects a single one to act as a representative of all the nodes within that particular time slice. Therefore, information about the exact identity of the node which caused the lack of support is lost. But since the node which gives a maximum support is used as a time.slice representative it seems likely that any other choice might be even worse.</Paragraph>
      <Paragraph position="18"> Although preliminary experiments produced promising results (around 3 % errors) it can be expected that the quality of the results depends on the kind of grammar used and utterances analyzed. Since the problem deserves further investigation, it is too early to give final results. The example in Figure 7 shows a simple case that demonstrates the shortcomings of the refined heuristics. Although these and children are not allowed to occur in a solution simultaneously, exactly these two words erroneously remain undeleted and finally make up the subject in the analysis. First, all values for the article a are deleted because of a missing number agreement with the possible dominating nodes and thereafter the values for the word houses are discarded since the semantic type does not match the selectional restrictions of the verb very well.</Paragraph>
      <Paragraph position="19">  The heuristics is not aware of the distinction between the time points and word graph nodes and, therefore, counts the determiner these as supporting the noun children.</Paragraph>
      <Paragraph position="20">  which may be analyzed &amp;quot;incorrectly&amp;quot; by the time slice pruning heuristics.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="85" end_page="86" type="metho">
    <SectionTitle>
7 Efficiency Issues
</SectionTitle>
    <Paragraph position="0"> Although pruning strategies bear a great potential for efficient and time-adaptive parsing schemes, the absolute computational expenses for a &amp;quot;blind&amp;quot; application of constraints are still unacceptably high. Additional techniques have to be employed to decrease actual computation times. One of the starting points for such improvements is the extremely large number of constraint evaluations during parsing: A few million constraint checks are quite common for realistic grammars and sentences of even modest size.</Paragraph>
    <Paragraph position="1"> Two approaches seem to be suitable for the reduction of the number of constraint evaluations: * Reduced application of constraints: A detailed analysis of how constraints are applied and under what circumstances they fail shows that most constraint checks are  '~seless&amp;quot; since the tested constraint is satisfied for some trivial reason. For instance, because most constraints are very specific about what levels are constrained and whether and how the dependency edges are connected, this information can be exploited in order to reduce the number of constraint checks. By applying constraints only to the relevant levels the number of constraint evaluation has been cut down to (at most) 40%. Taking into account the topological structure of the edges under consideration improves the efficiency by another 30% to 50%.</Paragraph>
    <Paragraph position="2"> Reduction of the number of constraint variables: A typical grammar contains a relatively large number of representational levels and for most word forms there are several entries in the lexicon. Since the lexical ambiguity of the word form usually is relevant only to one or very few levels, constraint variables need not be established for all lexical entries and all levels. For instance, the German definite determiner die has eight different morpho-syntactic feature combinations if one only considers variations of gender, case, and number. All these forms behave quite similarly with respect to non-syntactic levels. Consequently, it makes no difference if one merges the constraint variables for the non-syntactic levels except that now less constraint checks must be carried out. By considering the relevance of particular types of lexical ambiguity for constraint variables of different levels one achieves an efficient treatment of disjunctive feature sets in the lexicon (Foth, 1998). This technique reduced the time requirements by 75% to 90% depending on the details of the grammatical modeling. In particular, a clean modularization, both in the constraint set and the dictionary entries, results in considerable gains of efficiency. null In order to support the grammar writer, a graphical grammar environment has been devel* oped (cf. Figure 8). It includes an editor for dependency trees (cf. Figure 9) which allows to detect undesired constraint violations easily.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML