File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2109_intro.xml

Size: 4,999 bytes

Last Modified: 2025-10-06 14:00:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2109">
  <Title>Backward Beam Search Algorithm for Dependency Analysis of Japanese</Title>
  <Section position="3" start_page="0" end_page="754" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Dependency analysis is regarded as one of the standard methods of Japanese syntactic analysis. The Japanese dependency structure is usually represented by the relationship between phrasal units called 'bunsetsu'. A bunsetsu nsually contains one or more content words, like a noun, verb or adjective, and zero or more function words, like a postposition (case marker) or verb/noun sul~\[ix. The relation between two bunsetsu has a direction front a dependent to its head. Figure 1 shows examples of 1)unsetsu and dependencies. Each bunsetsu is separated by &amp;quot;I&amp;quot;&amp;quot; The first segment &amp;quot;KARE-HA&amp;quot; consists of two words, KARE (He) and HA (subject case marker). The numbers in the &amp;quot;head&amp;quot; line show the head ID of the corresponding bunsetsus.</Paragraph>
    <Paragraph position="1"> Note that the last segment does not have a head, and it is the head bunsetsu of the sentence. The task of the Japanese dependency analysis is to find the head ID for each bunsetsu.</Paragraph>
    <Paragraph position="2"> The analysis proposed in this paper has two conceptual steps. In the first step, dependency likelihoods are calculated for all possible pairs of bunsetsus. In the second step, an optimal dependency set for the entire sentence is retrieved.</Paragraph>
    <Paragraph position="3"> In this paper, we will mainly discuss the second step, a method fbr finding an optimal dependency set. In practice, the method proposed in this paper should be able to be combined with any systems which calculate dependency likelihoods. null It is said that Japanese dependencies have the  tbllowing characteristics1: (1) Dependencies are directed from left to right (2) Dependencies don't cross (3) Each seglnent except the rightmost one has only one head (4) In many cases, the left; context is not nec null essary to determine a dependency The analysis method proposed in this paper assumed these characteristics and is designed to utilize them. Based on these assumptions, we can analyze a sentence backwards (from right to left) in an efficient manner. There are two merits to this approach. Assume that we are analyzing the M-th segment of a sentence of length N and analysis has already been done for the (M + 1)-th to N-th segments (M &lt; N).</Paragraph>
    <Paragraph position="4"> The first merit is that the head of the dependency of the M-th segment is one of the seg1Of course, there are several exceptions (S.Shirai, 1998), but the frequencies of such exceptions are negligible compared to the current precision of the system. We believe those exceptions have to be treated when the problems we are facing at the moment are solved. Assumption (4) has not been discussed very much, but our investigation with humans showed that it is true in more titan 90deg./0 of the cases.</Paragraph>
    <Paragraph position="5">  ments between M + 1 and N (because of assumption 1), which are already analyzed. Because of this, we don't have to kce 1) a huge lnlln1)er of possible analyses, i.e. we can avoid something like active edges in a chart parser, or making parallel stacks in GLR parsing, as we can make a decision at this time. Also, we can use the beam search mechanism, 1)y keet)ing only a certain nmnl)er of.analysis candidates at (',ach segment. The width of the 1)(;am search can 1)c, easily tuned and the memory size of the i)ro(:ess is l)rot)ortional to the 1)roduct of the inl)ut sentence length and tile boron search width.</Paragraph>
    <Paragraph position="6"> The other merit is that the possit)le heads of tile d(~l)en(lency can t)e narrowed down 1)ccause of the ~ssuml)tion of non-crossing det)en(lencies (assumption 2). For exani1)le , if the K-th seglll(;nl; dCl)ends on the L-tll segnient (A4 &lt; \]~ &lt;~ L), then the \]~J-th segillent (:~l~n't depend on any segments between 1~ and L.</Paragraph>
    <Paragraph position="7"> According to our experilnent, this reduced the numl)er of heads to consider to less than 50(X~.</Paragraph>
    <Paragraph position="8"> The te(:hnique of backw~trd analysis of ,lal)anese sentences has 1)een used in rule-based methods, for example (Fujita, 1988). However, there are several difficulties with rule-based methods. First the rules are created by hmnans, so it is difficult to have wide coverage and keel) consistency of the rules. Also, it is difficult to incorporate a scoring scheme in rule-1)ased methods. Many such met;hods used hem'isties to make deterministic decisions (and backtracking if it; fails in a sear(:hing) rather l;han using a scoring scheme. However, the com1)ination of the backward analysis and the statistical method has very strong advantages, one of which is the 1)emn search.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML