File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/p96-1055_metho.xml

Size: 6,535 bytes

Last Modified: 2025-10-06 14:14:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="P96-1055">
  <Title>The Selection of the Most Probable Dependency Structure in Japanese Using Mutual Information</Title>
  <Section position="4" start_page="372" end_page="373" type="metho">
    <SectionTitle>
3 Selecting the Most Probable
</SectionTitle>
    <Paragraph position="0"> Structure RDG identifies all possible dependency structures which consist of modifier-modificant relations between elements in a sentence. The arcs in the following example show modifier-modificant relations which can be combined into six different dependency structures.</Paragraph>
    <Paragraph position="2"> n&amp;tional investig&amp;tion based cause pro--~ s-~~--Our objective is to develop a method to automatically select the correct dependency structures accurately or at least those which have the highest probability of being correct. We evaluate the various possible structures according to the mutual information between modifiers and particle-modificants.</Paragraph>
    <Paragraph position="3"> In some cases there is no particle and the modificant directly precedes the modifier (see example in section 3.2). To calculate the mutual information for each relation, we obtain form the COD the conceptual identifiers (a numerical code) for the modifiers that appear with the particle-modifica~t and the number of their occurrences in the corpus. If the pattern is not present, backing off, we search this information for the modificant only. For each of those concept identifiers we obtain from the CD all generalizers (concept identifiers that express a similar meaning in a more general way) and build a taxonomic hierarchy with them. Using the number of occurrences obtained, we calculate the mutual information for the concepts in the taxonomic hierarchy.</Paragraph>
    <Paragraph position="4"> We also build a taxonomic hierarchy for the modifier that appears with the particle-modificant in the sentence. Then comparing these two taxonomic hierarchies (one for the modifiers in the COD, one for the modifers in the sentence), we look for the concept identifier common to both hierarchies that has the highest mutual information. This is the mutual information for the relation itself. For each dependency structure we calculate a score by multiplying the mutual information for all ambiguous relations (the non-ambiguous do not contribute to the evaluation). The dependency structure with highest probability of being correct is the one with the highest score. Since all structures have the same number of relations, this multiplication reflects the likelyhood of the structure.</Paragraph>
    <Section position="1" start_page="372" end_page="373" type="sub_section">
      <SectionTitle>
3.1 The Algorithm
</SectionTitle>
      <Paragraph position="0"> The process described above is written in an algorithmic form as follows:  1. Select the ambiguous relations (those with more than one modificant) for each structure. 2. Search COD for the particle-nmdificant subpattern, in the corresponding positions. If there is no entry, search for the modificant only. 3. Obtain from the COD the concept identifiers for the modificant (there may be multiple meanings) and the concept identifiers with the number of their occurrences in the corpus for the modifiers which occur with the particle-modificant pattern.</Paragraph>
      <Paragraph position="1"> 4. For each modificant concept identifer, build a taxonomic hierarchy with its modifiers using CD to find the generalizer for each concept identifier. null 5. Calculate the mutual information 2 for all the concept identifiers in the taxonomic hierarchies.</Paragraph>
      <Paragraph position="2"> 6. For the modifiers in the sentence, extract their  concept identifiers from COD and build the taxonomic hierarchies using CD to find the generalizers for each concept identifier.</Paragraph>
      <Paragraph position="3"> 7. For each relation (modifier-particle-modificant pattern), search the concept identifier that generalizes the modifier word and has maximum nmtual information. This value is the mutual information for the relation.</Paragraph>
      <Paragraph position="4"> 8. For each dependency structure, multiply the mutual information of its ambiguous dependency relations to obtain the score for that structure.</Paragraph>
    </Section>
    <Section position="2" start_page="373" end_page="373" type="sub_section">
      <SectionTitle>
3.2 Examples
</SectionTitle>
      <Paragraph position="0"> The following figure shows the output from RDG for a given sentence. The arrows in the figure indicate the dependency relations.</Paragraph>
      <Paragraph position="1"> ~ ~ -- work people stress structure - lllIIOY~lOn progress grow worse The ambiguous relations are ~$i~g~ ~J~./v'C, and ~ A~-~ C/) ~ &amp;quot;~. Accordingly the occurrences for the modificants in these relations (~O~O, ~ (, (, ~, and ~(c)7o) are extracted from COD, obtaining a list of modifier concept identifiers with the number of their occurrences. Note that in the pattern ~ ( A and /~ ( ~ b l/7, the modificant precedes the modifier. The following figure shows some modifiers for ~ ((work) with their number of occurrences. null person wom~n mother drive each person f~ctory wife f~ct worker 32 18 6 6 3 3 3 2 2 2 Next, the taxonomic hierarchy for each particle-modificant is built and the mutual information calculated for each concept identifier. An extraPSt of the hierarchy for ~ ( is shown in the following figure.</Paragraph>
      <Paragraph position="2"> ~ (0.0)-~ ~2~ 2n~ pseudo-stilq life ~,~'~7o:TF~ 71 life ~.bstract product huma, n or similar /~ live body relative to action human # )~ (3.61) ~ (3.40) person force Next the generalizers for (~, A, and ;~ b P~) are searched in the hierarchies for their modificants to obtain the mutual information for the relations. For ~ (A (working person) it happened to be the concept A (person) itself with mutual information of 3.61. For ~ ( 5~ b l~ 5~ (working stress) the match occurred for ~ (force) giving a mutual information of 0.69.</Paragraph>
      <Paragraph position="3"> Multiply the mutual information for all the dependency relations in each structure. For the example sentence the mutual information for the ambiguous relations are as follows: ~-~.95 .~- --'~ &amp;quot;)(c)~'C'~\]'z From this the algorithm selects the parse with highest score which is drawn in thick lines. The next figure shows the result for the first example sentence. 1.60 3.40 sudden relation deep heart disease pressure more than 10&amp;quot;/0 was</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML