File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/p96-1055_intro.xml

Size: 3,341 bytes

Last Modified: 2025-10-06 14:06:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="P96-1055">
  <Title>The Selection of the Most Probable Dependency Structure in Japanese Using Mutual Information</Title>
  <Section position="3" start_page="0" end_page="372" type="intro">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"> As pointed out earlier, the dependency relation of elements in Japanese sentences are fairly complicated due to relatively free word orders. RDG is designed to determine dependency relations among words and phrases in sentences. To do so, it classifies the phrases according to grammatical categories and syntactic attributes. However, it fails to reject semantically unacceptable dependency structures. The inevitable consequence is that RDG often produces multiple parses even for a simple sentence.</Paragraph>
    <Paragraph position="1"> Kurohashi and Nagao (1993) try to determine the dependency relations of a sentence by means of using sample sentences. When the sentence is structurally ambiguous, they determine its structure by comparing it to structurally similax patterns taken from a manually generated set of examples and calculating similarity values.</Paragraph>
    <Paragraph position="2"> Our method, on the contrary, uses a statistical approach to select the most probable structure or parse of a given sentence. It takes as input dependency structures generated by RDG for a sentence, finds all of modifier-particle-modificant relations, calculates their mutual information and chooses the structure for which the product of the nmtual information of its relations is the highest.</Paragraph>
    <Paragraph position="3"> In order to calculate the mutual information for any modifier-particle-modificant pattern, we use the Conceptual Dictionary: (CD) to build a taxonomic hierarchy of the modifiers which occur  The Conceptual Dictionary is a set of graphs consisting of 400,000 concepts and a number of taxonomic as well as functional relations between them. The Co-occurrence Dictionary consist of a list of 1,100,000 dependency relations (modifier, particle and modificant) taken from a corpus. Each entry includes syntactic information, concept identifiers (a numerical code) and the number of occurrences in the corpus.</Paragraph>
    <Paragraph position="4">  with the particle-modificant sub-pattern in the Co-occurrence Dictionary (COD). The mutual information for any pattern is the maximum mutual information between the sub-pattern and the concepts in the taxonomic hierarchy which generalize the moditier in the pattern.</Paragraph>
    <Paragraph position="5"> Resnik and Hearst (1993) use a similar approach to calculate preferences for prepositional phrase attachment. While they use data on word groups, our method directly uses word co-occurrence data to estimate the preferences using the CD to identify the most adequate grouping for each relation.</Paragraph>
    <Paragraph position="6"> While Kurohashi and Nagao compare the sentence with a single sample of patterns, we use all occurrehces of the pattern in COD to calculate the mutual information. Our approach automatically extracts the occurrences from the dictionary as well as builds the taxonomic hierarchy. Unlike Kurohashi and Nagao (1993), which uses only verb and adjective patterns, we cover all dependency relations.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML