File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1083_metho.xml

Size: 10,095 bytes

Last Modified: 2025-10-06 14:14:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1083">
  <Title>Using Decision Trees to Construct a Practical Parser</Title>
  <Section position="3" start_page="505" end_page="505" type="metho">
    <SectionTitle>
2 Dependency Analysis in Japanese
Language
</SectionTitle>
    <Paragraph position="0"> This section overviews dependency analysis in the Japanese language. The parser generally performs the following three steps.</Paragraph>
    <Paragraph position="1">  1. Segment a sentence into a sequence ofbunsetsu. 2. Prepare modification matrix each value of which represents how one bunsetsu is likely to modify the other.</Paragraph>
    <Paragraph position="2"> 3. Find optimal modifications in a sentence by a  dynamic programming technique.</Paragraph>
    <Paragraph position="3"> Because there are no explicit delimiters between words in Japanese, input sentences are first word segmented, part-of-speech tagged, and then chunked into a sequence of bunsetsus. The first step yields, for the following example, the sequence of bunsetsu displayed below. The parenthesis in the Japanese expressions represent the internal structures of the bunsetsu (word segmentations).</Paragraph>
    <Paragraph position="5"> The second step of parsing is to construct a modification matrix whose values represent the likelihood that one bunsetsu modifies another in a sentence.</Paragraph>
    <Paragraph position="6"> In the Japanese language, we usually make two assumptions: null  1. Every bunsetsu except the last one modifies only one posterior bunsetsu.</Paragraph>
    <Paragraph position="7"> 2. No modification crosses to other modifications  in a sentence.</Paragraph>
    <Paragraph position="8"> Table 1 illustrates a modification matrix for the example sentence. In the matrix, columns and rows represent anterior and posterior bunsetsus, respectively. For example, the first bunsetsu &amp;quot;kinou- no&amp;quot; modifics the second 'yuugala-ni'with score 0.T0 and the third 'kinjo-no' with score 0.07. The aim of this paper is to generate a modification matrix by using decision trees.</Paragraph>
    <Paragraph position="9">  kfnou-no ~tul#ata.ni 0.70 yvugata-ni **njo-no 0.07 0.10 kfnjo.no kodorna-#a 0,10 0.10 0.70 kadomo*~a ~ain-~o 0,10 0.10 0.20 0.05 nomu.ta 0.03 0.70 0.10 0.95 i, aln. mlo  The final step of parsing optimizes the entire dependency structure by using the values in the modification matrix.</Paragraph>
    <Paragraph position="10"> Before going into our model, we introduce the notations that will be used in the model. Let S be the input sentence. S comprises a bunsetsu set B of length m ({&lt; bl,f~ &gt;,-.-,&lt; bm,f,, &gt;}) in which bi and fi represent the ith bunsetsu and its features, respectively. We define D to be a modification set; D = {rood(l),..., mod(m - 1)} in which rood(i) indicates the number of busetsu modified by the ith bunsetsu. Because of the first assumption, the length of D is always m- 1. Using these notations, the result of the third step for the example can be given as D = {2, 6, 4, 6, 6} as displayed in Figure 1.</Paragraph>
  </Section>
  <Section position="4" start_page="505" end_page="507" type="metho">
    <SectionTitle>
3 Decision Trees for Dependency
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="505" end_page="506" type="sub_section">
      <SectionTitle>
Analysis
3.1 Stochastic Model and Decision Trees
</SectionTitle>
      <Paragraph position="0"> The stochastic dependency parser assigns the most plausible modification set Dbe,t to a sentence S in  terms of the training data distribution. Dbest = argmax D P( D\[S) = arg,nax D P( D\[B) By assuming the independence of modifications, P(D\[B) can be transformed as follows. P(yeslbi, bj, fl ,&amp;quot;', fro) means the probability that a pair of bunsetsu bi and bj have a modification relation. Note that each modification is constrained by all features{f, ,--., fro} in a sentence despite of the assumption of independence.We use decision trees to dynamically select appropriate features for each combination of bunsetsus from {f,,---, fm }.</Paragraph>
      <Paragraph position="1"> mi-~P(yes\[bi, &amp;quot;&amp;quot; ,fro) P(DIB) = 1-I - bj, f,,. Let us first consider the single tree case. The training data for the decision tree comprise any unordered combination of two bunsetsu in a sentence.</Paragraph>
      <Paragraph position="2"> Features used for learning are the linguistic information associated with the two bunsetsu. The next section will explain these features in detail. The class set for learning has binary values yes and no which delineate whether the data (the two bunstsu) has a modification relation or not. In this setting, the decision tree algorithm automatically and consecutively selects the significant, features for discriminating modify/non-modify relations.</Paragraph>
      <Paragraph position="3"> We slightly changed C4.5 (Quinlan, 1993) programs to be able to extract class frequencies at every node in the decision tree because our task is regression rather than classification. By using the class distribution, we compute the probability PDT(yeslbi, bj, f ~,..., fro) which is the Laplace estimate of empirical likelihood that bi modifies bj in the constructed decision tree DT. Note that it. is necessary to norrealize PDT(yes\[bi, bj, f,,..., fro) to approximate P(yes\[bi,bj,fx,&amp;quot;',fm). By considering all candidates posterior to bi, P(yeslbi,b.i,fl,'&amp;quot;,fm) is computed using a heulistic rule (1). It is of course reasonable to normalize class frequencies instead of the probability PoT(yeslbi, bj,, f,,..., fro). Equation (1) tends to emphasize long distance dependencies more than is true for frequency-based normal-</Paragraph>
      <Paragraph position="5"> Let us extend the above to use a set of decision trees. As briefly mentioned in Section 1, a number of infrequent and exceptional expressions appear in any natural language phenomena; they deteriorate the overall performance of application systems. It is also difficult for automated learning systems to detect and handle these expressions because exceptional expressions are placed ill the same class as frequent ones. To tackle this difficulty, we generate a set of decision trees by adaboost (Freund and Schapire, 1996) algorithm illustrated in Table 2. The algorithm first sets the weights to 1 for all exanapies (2 in Table 2) and repeats the following two procedures T times (3 in Table 2).</Paragraph>
      <Paragraph position="6">  1. A decision tree is constructed by using the current weight vector ((a) in Table 2) 2. Example data are then parsed by using the tree and the weights of correctly handled examples are reduced ((b),(c) in Table 2) 1.</Paragraph>
      <Paragraph position="7"> '2..</Paragraph>
      <Paragraph position="8"> 3.</Paragraph>
      <Paragraph position="9">  Input: sequence of N examples &lt; eL, u,~ &gt; .... , &lt; eN, .wN &gt; in which el and wi represent an example and its weight, respectively.</Paragraph>
      <Paragraph position="10"> Initialize the weight vector wi =1 for i = 1,..., N Do for t = l,2,...,T  (a) Call C4.5 providing it with the weight vector w,s and Construct a modification probability set ht (b) Let Error be a set of examples that are not. identified by lit Compute the pseudo error rate of ht: e' = E iCE .... wi/ ~ ,=INw, if et &gt; 5' then abort loop l--e t (c) For examples correctly predicted by ht, update the weights vector to be wi = wiflt 4. Output a final probability set: hl=Zt=,T(log~)ht/Zt=,T(Iog~)  Algorithm The final probability set h I is then computed by mixing T trees according to their performance (4 in Table 2). Using h: instead of PoT(yeslbi , bj, fl,'&amp;quot;, f,,~), in equation (1) generates a boosting version of the dependency parser.</Paragraph>
    </Section>
    <Section position="2" start_page="506" end_page="507" type="sub_section">
      <SectionTitle>
3.2 Linguistic Feature Types Used for
Learning
</SectionTitle>
      <Paragraph position="0"> This section explains the concrete feature setting we used for learning. The feature set mainly focuses on  $'), &lt;6~', ~tE, t~'~t ~', l~'tt~&amp;quot;6, .:~, -'~', 5, a~., L, LC/~', E'.', &amp;quot;tr.,'t~L, &amp;quot;1-6, &amp;quot;t', &amp;quot;~, &amp;quot;~, &amp;quot;~ st ' ~-. \].'~, %*~t.t,- &amp;quot; , &amp;quot;~, \]_'0'), t.C/l~ * , ~**C/9&amp;quot;C, \]'.gt~,gl~,9\]'*~,9&amp;quot;C, 99, ~, ~C/~,, &amp; ~, __%, ~, ~a~, @t,, @t,L, @t,Ll2, @~6, ~'~&amp;quot;, tC/6, @6Ul:, to0, ~k~', ~k'C, ::, ~, 0~, d)h, tl, I~./J':), ~, I|E, It:, tt::~., t-C, ~b, ~ L&lt;I/, l.t~. ~, ~-, ~I.~R~I~'~, ~.~1~., ~,.~l~;l~\]f'tit, lg'~, $1&amp;quot;tf~,t~l, .V,C/IL ~\[\]glllql~\]. e~i~\], n o n, k~.,.X, ~J.C/~ non, &amp;quot;, ~, ~. \[, \[. \[, ~, l, &amp;quot;,',~,,,I,.I,\],J A(0), B(;~4), C(&gt;5) 7 0, 1 8 0, 1</Paragraph>
      <Paragraph position="2"> the two bunsetsu constituting each data.. Tile class set consists of binary values which delineate whether a sample (the two bunsetsu) have a modification relation or not. We use 13 features for the task, 10 directly from the 2 bunsetsu under consideration and  Each bunsetsu (anterior and posterior) has the 5 features: No.1 to No.5 in Table 3. Features No.6 to No.8 are related to bunsetsu pairs. Both No.1 and No.2 concern the head word of the bunsetsu.</Paragraph>
      <Paragraph position="3"> No.1 takes values of frequent words or thesaurus categories (NLRI, 1964). No.2, on the other hand, takes values of part-of-speech tags. No.3 deals with bullsetsu types which consist of functional word chunks or tile part-of-speech tags that dominate tile bullsetsu's syntactic characteristics. No.4 and No.5 are binary features and correspond to punctuation and parentheses, respectively. No.6 represents how many bunsetsus exist, between the two bunsetsus. Possible values are A(0), B(0--4) and C(&gt;5). No.7 deals with the post-positional particle 'wa' which greatly influences the long distance dependency of subject-verb modifications. Finally, No.8 addresses tile punctuation between the two bunsetsu. Tile detailed values of each feature type are summarized ill Table 4.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML