XML Viewer - c90-3030

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-3030_metho.xml
Size: 19,491 bytes
Last Modified: 2025-10-06 14:12:30
<?xml version="1.0" standalone="yes"?>
<Paper uid="C90-3030">
  <Title>logy. A General Computational Model for Word-Form Recognition and Production. Department of General</Title>
  <Section position="3" start_page="168" end_page="169" type="metho">
    <SectionTitle>
3. Local disambiguation
</SectionTitle>
    <Paragraph position="0"> Morphological ambiguities may be due to intersection of forms of the same or of different lexical entries, or to intersection of recursive compound paths. The latter phenomenon arises if productive compound formation in e.g. Finnish, German, and Swedish is  biguation discards more than 100,000. Especially dramatic the drop is for highly ambiguous words.</Paragraph>
  </Section>
  <Section position="4" start_page="169" end_page="169" type="metho">
    <SectionTitle>
4. Morphosyntactic mapping
</SectionTitle>
    <Paragraph position="0"> After local disambiguation, each word in the sentence undergoes morphosyntactic mapping, i.e. it is assigned at least one syntactic label, perhaps several if a unique label is not possible to assign. This mapping will be discussed in connection with the syntactic constraints in section 7.</Paragraph>
  </Section>
  <Section position="5" start_page="169" end_page="169" type="metho">
    <SectionTitle>
5. Context-dependent disambiguation
</SectionTitle>
    <Paragraph position="0"> constraints The CG formalism will first be illustrated by context-dependent disambiguation constraints. Sets of grammatical features are needed in the constraints for the purpose of genei'alization. Each set declaration consists of a set name followed by the elements of that set. The elements are (strings of) features and/or base-forms occurring in readings:  Each constraint is a quadruple consisting of domain, operator, target, and context condition(s). An example: (@w =0 &amp;quot;PREP&amp;quot; (-1 DET)) other readings. The operators are here defined in the procedural mode as performing operations. Conceptually they just express constraints.</Paragraph>
    <Paragraph position="1"> The context conditions are defined relative to the target reading in position 0. Position 1 is one word to the right of 0,-3 three words to the left of 0, etc. (Such straightforward positions we call absolute.) Each context condition is a triple consisting of polarity, position, and set.</Paragraph>
    <Paragraph position="2"> 'Polarity' is either NOT or nothing (i.e. positive), 'position' is a legal position number, and 'set' is a declared set name.</Paragraph>
    <Paragraph position="3"> An asterisk ..... (functionally and mnemotechnically reminiscent of the Kleene star) prefixed to position number n refers to some position rightwards of n (if n is positive), or some position leftwards of n (if n is negative), in both cases including n, up to the next sentence boundary (or clause boundary, if enforced in clause boundary mode, cf. below). The asterisk convention thus enables the description of unbounded dependencies.</Paragraph>
    <Paragraph position="4"> Examples: (1 N) requires there to be a reading with the feature &amp;quot;N&amp;quot; for the next word-form. (NOT *-1 VFIN) states: nowhere leftwards in this sentence is there a reading with any of the feature combinations defining finite verbs. The condition ensemble (1 PREMOD) (2 N) (3 VFIN) requires there to be a reading with either &amp;quot;A&amp;quot; or &amp;quot;DET&amp;quot; in position 1, with &amp;quot;N&amp;quot; in position 2, and with one of the VFIN readings in position 3. Here are two more context-dependent disambiguation constraints for English:</Paragraph>
    <Paragraph position="6"> stating that if a word (@w) has a reading with the feature &amp;quot;PREP&amp;quot;, this very reading is discarded (=0) iff the preceding word (i.e. the word in position -1) has a reading with the feature &amp;quot;DET&amp;quot;.</Paragraph>
    <Paragraph position="7"> The domain points out some element to be disambiguated, e.g. (the readings of) a particular wordform. The designated domain @w is a variable over any word-form, used when the target reading is picked by feature(s) only.</Paragraph>
    <Paragraph position="8"> The target defines which reading the constraint is about. The target may refer to one particular reading, such as &amp;quot;V PRES -SG3&amp;quot;, or to all members of a declared set, such as VFIN.</Paragraph>
    <Paragraph position="9"> The operator defines which operation to perform on the reading(s). There are three disambiguation operators, here treated in order of decreasing strength. The operator '=!!' indicates that the target reading is the correct one iff all context conditions are satisfied; all other readings should be discarded. If the context conditions are not satisfied, the target reading itself is discarded. The operator '=!' indicates that the target reading is the correct one iff all context conditions are satisfied, all other readings are discarded. The operator '=0' discards the target reading iff the context conditions are satisfied, it leaves all The first one discards all finite verb readings immediately after the base-form to (itself either a preposition or an infinitive mark). VFIN is a declared set. The constraint is applicable to all strings declared to belong to this set.</Paragraph>
    <Paragraph position="10"> The second constraint states that the proper reading of the word thatis relative pronoun (i.e. a reading containing the string &amp;quot;&lt;Rel&gt;&amp;quot;, itself an inherent feature emanating from the lexicon) immediately after a nominal head and immediately belore a finite verb.</Paragraph>
    <Paragraph position="11"> There is also a mechanism available for expressing relative position with reference to variable positions established via unbounded dependencies. Let condition ('1 VFIN) be satisfied at absolute position 5, i.e. at the fifth word to the right. Then (L-1 N) would require there to be a feature &amp;quot;N&amp;quot; in absolute position 4, (L* N) would establish a second unbounded dependency somewhere left of position 5 (but right of position 0), i.e. looking for satisfaction at one of positions 4,3,2,1.</Paragraph>
    <Paragraph position="12"> Often context conditions work on ambiguous cohorts, i.e. one reading satisfiesthe condition, but this reading perhaps is not the correct one in the first place. If so, should a risk be taken? The CG formalism makes this a matter of deliberate choice. All 170 3 constraints so far treated allow the context conditions to be satisfied by ambiguous context cohorts. By appending the character C to the position number, one requires the respective condition to be satisfied only if the cohort being tested is itself unambiguous. This is called careful mode, e.g.: classical repertoire of heads and modifiers. CG syntax maps morphological categories and word order information onto syntactic labels.</Paragraph>
    <Paragraph position="13"> The designated syntactic subsets of verb chain elements, head labels, and modifier labels should be established. For English, these include e.g.: (@w =0 VFIN (-1C TO)) For many constraints it is necessary to require that they do not apply over clause boundaries. This clause boundary mode is effected by appending either of the atoms **CLB (ordinary mode) or**CLB-C (careful mode) after the last context condition. Clause boundary mode is typically used in conjunction with unbounded contexts.</Paragraph>
    <Paragraph position="14"> A template mechanism is available for expressing partial generalizations. E.g., a template &amp;quot;&amp;NP&amp;quot; could be declared to contain the alternatives ((N)), ((A) (N)) ((DET) .(N)) ((DET) (A) (N)), etc. Then the template &amp;NP could be used in the context parl of any constraint. At run-time all alternative realizations of &amp;NP would be properly considered.</Paragraph>
    <Paragraph position="15"> Every constraint embodies a true statement. Occasionally the constraints might seem quite down-toearth and even 'trivial', given mainstream conceptions of what constitutes a 'linguistically significant generalization'. But the very essence of CG is that low-level constraints (i) are easily expressible, and (it) prove to be effective in parsing.</Paragraph>
  </Section>
  <Section position="6" start_page="169" end_page="169" type="metho">
    <SectionTitle>
6. Constraints for intrasentential clause
</SectionTitle>
    <Paragraph position="0"> boundaries Clause boundary constraints establish locations of clause boundaries. They are important especially for the formulation of proper syntactic constraints. E.g., the syntactic constraint &amp;quot;there is only one finite predicate in a simplex non-coordinated clause&amp;quot; presupposes that clause boundary locations are known.</Paragraph>
    <Paragraph position="1"> Clause boundaries occur i.a. as the inherent feature &amp;quot;&lt;**CLB&gt;&amp;quot; in the input stream. E.g. subjunctions are lexically marked by this feature. But many boundaries must be spotted by specific constraints.</Paragraph>
    <Paragraph position="2"> Clause boundary constraints have the special operator &amp;quot;=**CLB&amp;quot; stating that there is a clause boundary before the word specified by the target.</Paragraph>
    <Paragraph position="3"> E.g., given that conjunctions are lexically marked by the inherent feature &amp;quot;&lt;Conj&gt;&amp;quot;, the constraint: (@w =**CLB &amp;quot;&lt;Conj&gt;&amp;quot; (1 NOMHEAD) (2 VFIN)) states that there is a clause boundary before conjunction instances that precede a NOMHEAD followed by a finite verb (e.g., before the conjunction in a sentence such as John eats and Bill drinks).</Paragraph>
    <Paragraph position="4"> 7. Syntactic constraints CG syntax is based on dependency and should assign flat, functional, surface labels, optimally one to each word-form. The labels are roughly the verb chain members: @+FAUXV (finite auxiliary V), @-FAUXV (non-finite auxiliary V), @+FMAINV (finite main V), @-FMAINV (nonfinite main V) ....</Paragraph>
    <Paragraph position="5"> * nominal heads: @SUB J, @OBJ, @I-OBJ, @PCOMPL-S (subj. pred. compl.), @PCOMPL-O (obj. pred. compl.), @ADVL (adverbial) .... deg nominal modifiers: AN&gt; (adjective as premodi~ tier to N), DN&gt; (determiner as premodifier to N), &lt;NOM (t-mstmodifier to nominal), A&gt; (premoditier to A), &lt;P (postmodifier to P) ....</Paragraph>
    <Paragraph position="6"> A verb chain such as has been reading gets the labels @+FAUXV @-FAUXV @-FMAINV. In the sentenceShe boughtthe car, she is @SUBJ and car @OBJ.</Paragraph>
    <Paragraph position="7"> Certain verb chain and head labels may occur maximally once in a simplex clause. This restriction we call the Uniqueness Principle. At least @+FAUXV, @+FMAINV, @SUBJ, @OBJ, @I-OBJ, @PCOMPL-S, and @PCOMPL-O obey this restriction. Many constraints may be based on consequences of the Uniqueness Principle. E.g., if a morphologically and syntactically unambiguous @SUBJ has been identified in a clause, all other instances of @SUBJ occurring in syntactically ambiguous readings of that clause may be discarded.</Paragraph>
    <Paragraph position="8"> Modifier and complement labels point in the direction (right &amp;quot;&gt;&amp;quot;, left &amp;quot;&lt;&amp;quot;) of the respective head which is identified by its part-of-speech label. E.g., the label @&lt;P is assigned to a prepositional complement such as park in in the park. Our analysis of modifier and complement labels is more delicate than in traditional grammar, cf. the premodifiers AN&gt;, DN&gt;, NN&gt;, GN&gt; (genitival).</Paragraph>
    <Paragraph position="9"> In Constraint Grammar, syntactic labels are assigned in three steps. The basic strategy is: Do as much as possible as early as possible.</Paragraph>
    <Paragraph position="10"> The first step is to provide as many syntactic labels as possible in the lexicon (including morphology).</Paragraph>
    <Paragraph position="11"> For entries having a reduced set of labels (compared to what that morphological class normally has), those labels will be listed in the lexicon. Thus, output from lexicon and morphology will indicate that he is @SUB J, that him is either @OBJ, @I-OBJ, or @&lt;P (NB: a considerably reduced subset of all nominal head functions), that went is @+FMAINV, etc.</Paragraph>
    <Paragraph position="12"> The second step is morphosyntactic mapping For all readings that remain after local disambiguation and do not yet have any syntactic function label, simple mapping statements tell, for each relevant morphological feature, or combination of features, what its range of syntactic labels is. This may be 4 171 compared to traditional grammar book statements such as &amp;quot;the syntactic functions of nou ns are subject, object, indirect object .... &amp;quot;.</Paragraph>
    <Paragraph position="13"> CG contains one enrichment of this scheme. A mapping statement may be constrained by the context condition mechanism specified in section 5.</Paragraph>
    <Paragraph position="14"> Thus, a mapping statement is a triple &lt;morphological feature(s), context condition(s), syntactic function(s)&gt;. The first element is a feature string occurring in a morphological reading, the second is either NIL (no conditions) or a list of sublists each of which is a legal context condition. Finally the requisite grammatical function label(s) are listed. Here are some mapping statements without context conditions, providing a maximal set of labels:</Paragraph>
  </Section>
  <Section position="7" start_page="169" end_page="169" type="metho">
    <SectionTitle>
@PCOMPL-S @PCOMPL-O @APP @NN&gt; @&lt;P)
</SectionTitle>
    <Paragraph position="0"> A pronoun in the genitive case is either prenominal genitival modifier, subject predicate complement, or object predicate complement. An adjective is pre-nominal adjectival modifier, predicate complement, subject, object, or indirect object (the last three functions refer to occurrences of adjectives as 'nominalized heads'), etc.</Paragraph>
    <Paragraph position="1"> Often morphosyntactic mappings may be considerably constrained by imposing context conditions :  These state that a noun in the nominative case premodifies (@NN&gt;) a subsequently following noun (in compounds, cf. computer screen), that a noun in the nominative case after a preposition is @&lt;P, and that an infinitive preceded by a noun + to postmodifies that noun.</Paragraph>
    <Paragraph position="2"> In this way, the task of syntactic analysis is simplified as much as possible, as early as possible. Superfluous alternatives are not even introduced into the parsing of a particular clause if it is clear at the outset, i.e. either in the lexicon or at the stage of morphosyntactic mapping, that certain labels are incompatible with the clausal context at hand.</Paragraph>
    <Paragraph position="3"> There may be several mapping statements for the same morphological feature(s), e.g. &amp;quot;N NOM&amp;quot;. Mapping statements with more narrowly specified contexts have precedence over more general statements. In the present implementation of CGP, the mapping statements apply in plain linear order.</Paragraph>
    <Paragraph position="4"> The last mapping statement for a particular feature  provides the worst case, i.e. the maximal assortment of function labels for that feature.</Paragraph>
    <Paragraph position="5"> Every word-form will have at least one syntactic label after morphosyntactic mapping, and all possible syntactic ambiguities have also now been introduced.</Paragraph>
    <Paragraph position="6"> In step three, syntactic constraints reduce syntactic ambiguities where such exist due either to lexical information (cf. the infinitive move above), or to morphosyntactic mapping. Syntactic constraints discard the remaining superfluous syntactic labels. Syntactic constraints differ from context-dependent disambiguation constraints only by having one of the syntactic operators '=s!', or '=sO' (where s indicates that the constraint is a syntactic one). Their semantics is identical to that of the disambiguation constraints: null (@w =sO &amp;quot;@+FMAINV&amp;quot; (*-1 VFIN)) (@w =sO &amp;quot;@+FMAINV&amp;quot; ('1 VFIN)) (@w =s! &amp;quot;@SUBJ&amp;quot; (0 NOMHEAD) (NOT &amp;quot;1 NOM-HEAD) ('1 VFIN)(NOT *-1 NOMHEAD)) The first two constraints discard @+FMAINV as a syntactic alternative if there is a unique finite main verb either to the left or to the right in the same clause. The third constraint prescribes that @SUBJ is the correct label for a noun or pronoun (NOM-HEAD in target position, i.e. position 0), with a finite verb somewhere to the right in the same clause and no similar noun or pronoun either left or right (-woman -- laughed --).</Paragraph>
    <Paragraph position="7"> Maximal profit is extracted from the Uniqueness Principle. At each syntactic step (before mapping, after mapping, and after the application of a syntactic constraint that affects the labels obeying the Uniquehess Principle), each clause is checked for eventual violations of this principle. In this way many ambiguous primary labels may be safely discarded.</Paragraph>
    <Paragraph position="8"> Here is an example sentence, fully analyzed and unambiguous in all respects but the one syntactic ambiguity remaining for the word in:  There is no non-semantic way of resolving the attachment ambiguity of the adverbial in the park.</Paragraph>
    <Paragraph position="9"> This ambiguity is therefore properly unresolved.</Paragraph>
    <Paragraph position="10"> In CGP, all ambiguities 'are there' after morpho-syntactic mapping and require no additional processing load. Notice in passing that CGP makes an interesting prediction which might be relevant from the viewpoint of mental language processing. Disambiguation, i.e. finding a unique interpretation by applying constraints, requires 'more effort' than leav.o ing all or many ambiguities unresolved (in which case constraints were not applied). Parsers based on autonomous grammars tend to work in the opposite way (the more ambiguities, the more rules to apply and trees to construct).</Paragraph>
    <Paragraph position="11"> In CGP, there is precisely one output for each sentence regardless of how many unresolved ambiguities there might be pending in it. This output is an annotated linear, flat string of word-forms, baseforms, inherent features, morphological features, and syntactic function labels, all of the same formal type. The dependency structure of the sentence is expressed by the pointers and parts of speech of the syntactic labels.There is no proliferation of parse trees, often encountered in other types of parsers, even if morphological and/or syntactic ambiguities are left unresolved.</Paragraph>
    <Paragraph position="12"> 8. Implementation I have written an interpreter in strict Common Lisp for parsing with constraint grammars. This is what we call the Constraint Grammar Parser (CGP). CGP currently runs on Unix workstations under Lucid Common Lisp and Allegro Common Lisp. A PC version with the same functionality runs under mu-Lisp on ordinary XT/AT machines.</Paragraph>
    <Paragraph position="13"> CGP takes two inputs, a constraint file with set declarations, mapping statements, context-dependent disambiguation constraints, syntactic constraints, etc., and a text file with morphologically analyzed word-forms (cf. section 2).</Paragraph>
    <Paragraph position="14"> The optimal implementation of constraint grammar parsing would be in terms of finite-state machines (cf. Kimmo Koskenniemi, COLING-90 Proceedings, Vol. 2).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML