XML Viewer - p97-1029

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1029_metho.xml
Size: 20,113 bytes
Last Modified: 2025-10-06 14:14:37
<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1029">
  <Title>Morphological Disambiguation by Voting Constraints</Title>
  <Section position="3" start_page="0" end_page="223" type="metho">
    <SectionTitle>
2 Morphological Disambiguation
</SectionTitle>
    <Paragraph position="0"> In all languages, words are usually ambiguous in their parts-of-speech or other morphological features, and may represent lexical items of different syntactic categories, or morphological structures depending on the syntactic and semantic context. In languages like English, there are a very small number of possible word forms that can be generated from a given root word, and a small number of part-of-speech tags associated with a given lexical form. On the other hand, in languages like Turkish or Finnish with very productive agglutinative morphology, it is possible to produce thousands of forms (or even millions (Hankamer, 1989)) from a given root word and the kinds of ambiguities one observes are quite different than what is observed in languages like English. null In Turkish, there are ambiguities of the sort typically found in languages like English (e.g., book/noun vs book/verb type). However, the agglutinative nature of the language usually helps resolution of such ambiguities due to the restrictions on morphotactics of subsequent morphemes. On the 1Voutilainen, Private communication.</Paragraph>
    <Paragraph position="1">  other hand, this very nature introduces another kind of ambiguity, where a lexical form can be morphologically interpreted in many ways not usually predictable in advance. Furthermore, Turkish allows very productive derivational processes and the information about the derivational structure of a word form is usually crucial for disambiguation (Oflazer and Tiir, 1996).</Paragraph>
    <Paragraph position="2"> Most kinds of morphological ambiguities that we have observed in Turkish typically fall into one the following classes: ~ 1. the form is uninflected and assumes the default inflectional features, e.g., I. taS (made of stone)  3. The root of one of the parses is a prefix string of the root, of the other parse, and the parse with the shorter root word has a suffix which surfaces as the rest of the longer root word, e.g., 1. koyu+\[u\]n (your dark (thing))  clarity, and English glosses have been given. We have also provided the morpheme structure, where \[...\]s, indicate elision. Glosses are given as linear feature value sequences corresponding to the morphemes (which are not shown). The feature names are as follows: CAT-major category, TYPE-minor category, R00T-main root form, AGR -number and person agreement, P0SS - possessive agreement, CASE - surface case, CONV - conversion to the category following with a certain suffix indicated by the argument after that, TAMl-tense, aspect, mood marker 1, SENSE-verbal polarity. Upper cases in morphological output indicates one of the non-ASCII special Turkish characters: e.g., G denotes ~, U denotes /i, etc.</Paragraph>
    <Paragraph position="3">  form while another is form derived by a productive derivation as in 1 and 2 below.</Paragraph>
    <Paragraph position="4">  6. The same suffix appears in different positions in the morphotactic paradigm conveying different information as in 2 and 3 below.</Paragraph>
    <Paragraph position="5"> 1. uygulama / (application)  phological disambiguation by choosing for a given ambiguous token, the correct parse in a given context. It is certainly possible that a given token may have nmltiple correct parses, usually with the same inflectional features, or with inflectional features not ruled out by the syntactic context, but one will be the &amp;quot;correct&amp;quot; parse usually on semantic grounds. We consider a token fully disambiguated if it has only one morphological parse remaining after automatic disambiguation. We Consider a token as correctly disambiguated, if one of the parses remaining for that token is the correct intended parse. We evaluate the resulting disambiguated text by a number of metrics defined as follows (Voutilainen, 1995a):  In the ideal case where each token is uniquely and correctly disambiguated with the correct parse, both recall and precision will be 1.0. On the other hand, a  text where each token is annotated with all possible parses, 3 the recall will be 1.0, but the precision will be low. The goal is to have both recall and precision as high as possible.</Paragraph>
  </Section>
  <Section position="4" start_page="223" end_page="224" type="metho">
    <SectionTitle>
3 Constraint-based Morphological
</SectionTitle>
    <Paragraph position="0"> Disambiguation This section outlines our approach to constraint-based morphological disambiguation where constraints vote on matching parses of sequential tokens. null</Paragraph>
    <Section position="1" start_page="223" end_page="223" type="sub_section">
      <SectionTitle>
3.1 Constraints on morphological parses
</SectionTitle>
      <Paragraph position="0"> We describe constraints on the morphological parses of tokens using rules with two components</Paragraph>
      <Paragraph position="2"> where the Ci are (possibly hierarchical) feature constraints on a sequence of the morphological parses, and V is an integer denoting the vote of the rule.</Paragraph>
      <Paragraph position="3"> To illustrate the flavor of our rules we can give the following examples: 1. The following rule with two constraints matches parses with case feature ablative, preceding a parse matching a postposition subcategorizing for an ablative nominal form.</Paragraph>
      <Paragraph position="4">  \[ \[case : abl\] , \[cat : postp, subcat : abl\] \] 2. The rule \[ \[agr : '2SG', case : gen\] , \[cat : noun, poss : ' 2SG '\] \] matches a nominal form with a possessive marker 2SG, following a pronoun with 2SG agreement and genitive case, enforcing the simplest form of noun phrase constraints.</Paragraph>
      <Paragraph position="5"> 3. In general constraints can make references to  tile derivational structure of the lexical form and hence be hierarchicah For instance, the following rule is an example of a rule employing a hierarchical constraint: \[ \[cat : adj, stem : \[taml : narr\] \] , \[cat : noun, st em :no\] \] which matches tile derived participle reading of a verb with narrative past tense, if it is followed by an underived noun parse.</Paragraph>
    </Section>
    <Section position="2" start_page="223" end_page="224" type="sub_section">
      <SectionTitle>
3.2 Determining the vote of a rule
</SectionTitle>
      <Paragraph position="0"> There are a number of ways votes can be assigned to rules. For the purposes of this work the vote of a rule is determined by its static properties, but it is certainly conceivable that votes can be assigned or learned by using statistics from disambiguated corpora. 4 For static vote assignment, intuitively, we would like to give high votes to rules that are more specific: i.e., to rules that have  aAssuming no unknown words.</Paragraph>
      <Paragraph position="1"> 4We have left this for future work.</Paragraph>
      <Paragraph position="2"> * higher number of constraints, * higher number of features in the constraints, * constraints that make reference to nested stems  (from which the current form is derived), * constraints that make reference to very specific features or values.</Paragraph>
      <Paragraph position="3"> Let R = (C1,C2,'&amp;quot;,C~;V) be a constraint rule. The vote V is determined as</Paragraph>
      <Paragraph position="5"> where V(Ci) is the contribution of constraint Ci to the vote of the rule R. A (generic) constraint has the following form: C -- \[(fl : vl) (f2 : v2)&amp;5... (fro : vm)\] where fi is the name of a morphological feature, and vi is one of the possible values for that feature. The contribution of fi : vi in the vote of a constraint depends on a number of factors:  1. The value vi may be a distinguished value that has a more important function in disambiguation. 5 In this case, the weight of the feature constraint is w(vi)(&gt; 1).</Paragraph>
      <Paragraph position="6"> 2. The feature itself may be a distinguished feature which has more important function in disambiguation. In this case the weight of the feature is w(fi)(&gt; 1).</Paragraph>
      <Paragraph position="7"> 3. If the feature fi refers to the stem of a derived form and the value part of the feature con- null straint is a full fledged constraint C' on the stem structure, the weight of the feature constraint is found by recursively computing the vote of C' and scaling the resulting value by a factor (2 in our current system) to improve its specificity.  4. Otherwise, the weight of the feature constraint is 1.</Paragraph>
      <Paragraph position="8">  For example suppose we have the following constraint: null \[cat :noun, case : gen, stem:\[cat:adj, stem:\[cat:v\], suffix=mis\]\] Assuming the value gen is a distinguished value with weight 4 (cf., factor 1 above), the vote of this  constraint is computed as follows: 1. cat :noun contributes 1, 2. case:gen contributes 4, 3. stem:\[cat:adj, stem: \[cat:v\],suffix=mis\] contributes 8 computed as follows: (a) cat :adj contributes 1,  (b) suffYx=mS.s contributes 1, (c) stem: \[cat:v\] contributes 2 = 2* 1, the 1 being from cat : v, (d) the sum 4 is scaled by 2 to give 8. 4. Votes from steps 1, 2 and 3(d) are added up to  give 13 as the constraint vote.</Paragraph>
      <Paragraph position="9"> We also employ a set of rules which express preferences among the parses of single lexical form independent of the context in which the form occurs. The weights for these rules are currently manually determined. These rules give negative votes to the parses which are not preferred or high votes to certain parses which are always preferred. Our experience is that such preference rules depend on the kind of the text one is disambiguating. For instance if one is disambiguating a manual of some sort, imperative readings of verbs are certainly possible, whereas in normal plain text with no discourse, such readings are discouraged.</Paragraph>
    </Section>
    <Section position="3" start_page="224" end_page="224" type="sub_section">
      <SectionTitle>
3.3 Voting and selecting parses
</SectionTitle>
      <Paragraph position="0"> A rule R = (C1,62,'&amp;quot;, Cn; V) will match a sequence of tokens wi, Wi+l, * *., wi+n-1 within a sentence wl through ws if some morphological parse of every token wj,i &lt; j &lt; i + n - 1 is subsumed by the corresponding constraint Cj-i+l. When all constraints match, the votes of all the matching parses are incremented by V. If a given constraint matches more than one parse of a token, then the votes of all such matching parses are incremented.</Paragraph>
      <Paragraph position="1"> After all rules have been applied to all token positions in a sentence and votes are tallied, morphological parses are selected in the following manner. Let vt and Vh be the votes of the lowest and highest scoring parses for a given token. All parses with votes equal to or higher than vt + m * (Vh -- vt) are selected with m (0 _&lt; m _&lt; 1) being a parameter.</Paragraph>
      <Paragraph position="2"> m = 1 selects the highest scoring parse(s).</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="224" end_page="225" type="metho">
    <SectionTitle>
4 Results from Disambiguating
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="224" end_page="224" type="sub_section">
      <SectionTitle>
Turkish Text
</SectionTitle>
      <Paragraph position="0"> We have applied our approach to disambiguating Turkish text. Raw text is processed by a preprocessor which segments the text into sentences using various heuristics about punctuation, and then tokenizes and runs it through a wide-coverage high-performance morphological analyzer developed using two-level morphology tools by Xerox (Karttunen, 1993). The preprocessor module also performs a number of additional functions such as grouping of lexicalizcd and non-lexicalized collocations, compound verbs, etc., (Ofiazer and Kurubz, 1994; Oflazer and Tiir, 1996). The preprocessor also uses a second morphological processor for dealing with unknown words which recovers any derivational and inflectional information from a word even if the root word is not known. This unknown word processor has a (nominal) root lexicon which recognizes S +, where S is the Turkish surface alphabet (in the two-level morphology sense), but then tries to interpret an arbitrary postfix string of the unknown word, as a sequence of Turkish suffixes subject to all morphographemic constraints (Oflazer and Tfir, 1996).</Paragraph>
      <Paragraph position="1"> We have applied our approach to four texts labeled ARK, HIST, MAN, EMB, with statistics given in Table 1. The tokens considered are those that are generated after morphological analysis, unknown word processing and any lexical coalescing is done. The words that are counted as unknown are those that could not even be processed by the unknown noun processor as they violate Turkish morphographemic constraints. Whenever an unknown word has more than one parse it is counted under the appropriate group. 6 The fourth and fifth columns in this table give the average parses per token and the initial precision assuming initial recall is 100%.</Paragraph>
      <Paragraph position="2"> We have disambiguated these texts using a rule base of about 500 hand-crafted rules. Most of the rule crafting was done using the general linguistic constraints and constraints that we derived from the first text, ARK. In this sense, this text is our &amp;quot;training data&amp;quot;, while the other three texts were not considered in rule crafting.</Paragraph>
      <Paragraph position="3"> Our results are summarized in Table 2. The last four columns in this table present results for different values for the parameter rn mentioned above, m = 1 denoting the case when only the highest scoring parse(s) is (are) selected. The columns for m &lt; 1 are presented in order to emphasize that drastic loss of precision for those cases. Even at m = 0.95 there is considerable loss of precision and going up to m = 1 causes a dramatic increase in precision without a significant loss in recall. It can be seen that we can attain very good recall and quite acceptable precision with just voting constraint rules. Our experience is that we can in principle add highly specialized rules by covering a larger text base to improve our recall and precision for the m = 1. A post-mortem analysis has shown that cases that have been missed are mostly due to morphosyntactic dependencies that span a context much wider that 5 tokens that we currently employ.</Paragraph>
    </Section>
    <Section position="2" start_page="224" end_page="225" type="sub_section">
      <SectionTitle>
4.1 Using root and contextual statistics
</SectionTitle>
      <Paragraph position="0"> We have employed two additional sources of information: root word usage statistics, and contextual statistics. We have statistics compiled from previously disambiguated text, on root frequencies. After the application of constraints as described above, for 6The reason for the (comparatively) high number of unknown words in MAN, is that tokens found in such texts, like .\[10, denoting a function key in the computer can not be parsed as a Turkish root word!  tokens which are still ambiguous with ambiguity resulting from different root words, we discard parses if the frequencies of the root words for those parses are considerably lower than the frequency of the root of the highest scoring parse. The results after applying this step on top of voting, with m = 1, are shown in the fourth column of Table 3 (labeled V+R).</Paragraph>
      <Paragraph position="1"> On top of this, we use the following heuristic using context statistics to eliminate any further ambiguities. For every remaining ambiguous token with unambiguous immediate left and right contexts (i.e., the tokens in the immediate left and right are unambiguous), we perform the following, by ignoring the root/stem feature of ~he parses: 1. For every ambiguous parse in such an unambiguous context, we count how many times, this parse occurs unambiguously in exactly the same unambiguous context, in the rest of the text.</Paragraph>
      <Paragraph position="2"> 2. We then choose the parse whose count is substantially higher than the others.</Paragraph>
      <Paragraph position="3"> The results after applying this step on of the previous two steps are shown in the last column of Table 3 (labeled V+R+C). One can see from the last three columns of this table, the impact of each of the steps. By ignoring root/stem features during this process, we essentially are considering just the top level inflectional information of the parses. This is very similar to Brill's use of contexts to induce transformation rules for his tagger (Brill, 1992; Brill, 1995), but instead of generating transformation rules from a training text, we gather statistics and apply them to parses in the text being disambiguated.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="225" end_page="226" type="metho">
    <SectionTitle>
5 Efficient Implementation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="225" end_page="226" type="sub_section">
      <SectionTitle>
Techniques and Extensions
</SectionTitle>
      <Paragraph position="0"> The current implementation of the voting approach is meant to be a proof of concept implementation and is rather inefficient. However, the use of regular relations and finite state transducers (Kaplan and Kay, 1994) provide a very efficient implementation method. For this, we view the parses of the tokens making up a sentence as making up a acyclic a finite state recognizer with the states marking word boundaries and the ambiguous interpretations of the tokens as the state transitions between states, the rightmost node denoting the final state, as depicted in Figure 1 for a sentence with 5 tokens. In Figure 1, the transition labels are triples of the sort (wi, pj, O) for the jth parse of token i, with the 0 indicating the initial vote of the parse. The rules imposing constraints can also be represented as transducers which increment the votes of the matching transi-</Paragraph>
      <Paragraph position="2"> tion labels by an appropriate amount. ~ Such transducers ignore and pass through unchanged, parses that they are not sensitive to.</Paragraph>
      <Paragraph position="3"> When a finite state recognizer corresponding to the input sentence (which actually may be considered as an identity transducer) is composed with a constraint transducer, one gets a slightly modified version of the sentence transducer with possibly additional transitions and states, where the votes of some of the labels have been appropriately increlnented. When the sentence transducer is composed with all the constraint transducers in sequence, all possible votes are cast and the final sentence transducer reflects all the votes. The parse corresponding to each token with the highest vote can then be selected. The key point here is that due to the nature of the composition operator, the constraint transducers can be composed off-line first, giving a single constraint transducer and then this one is composed with every sentence transducer once (See Figure 2).</Paragraph>
      <Paragraph position="4"> The idea of voting can further be extended to a path voting framework where rules vote on paths containing sequences of matching parses and the path from the start state to the final stated with the highest votes received, is then selected. This can be implemented again using finite state transducers as described above (except that path vote is apportioned equally to relevant parse votes), but instead of selecting highest scoring parses, one selects the path from the start state to one of the final states where the sum of the parse votes is maximum. We have recently completed a prototype implementation of this approach (in C) for English (Brown Corpus) and have obtained quite similar results (Tiir, Oflazer, and Oz-kan, 1997).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML