File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2931_metho.xml

Size: 7,971 bytes

Last Modified: 2025-10-06 14:10:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2931">
  <Title>Dependency Parsing Based on Dynamic Local Optimization</Title>
  <Section position="4" start_page="0" end_page="211" type="metho">
    <SectionTitle>
2 Dependency Probabilities
</SectionTitle>
    <Paragraph position="0"> An example of Chinese dependency tree is showed in Figure1. The tree can be represented as a directed graph with nodes representing word tokens and arcs  representing dependency relations. The assumption that the arcs are independent on each other often is made so that parsing can be handled easily. On the other side the independence assumption will result in the loss of information because dependencies are interrelated on each other actually. Therefore, two kinds of probabilities are used in our parser. One is arc probabilities which are the possibility that two nodes form an arc, and the other is structure probabilities which are used to describe some specific syntactic structures.</Paragraph>
    <Section position="1" start_page="211" end_page="211" type="sub_section">
      <SectionTitle>
2.1 Arc Probabilities
</SectionTitle>
      <Paragraph position="0"> A dependency arc A i can be expressed as a 4-tuple A</Paragraph>
      <Paragraph position="2"> are nodes that constitute the directed arc. D is the direction of the arc, which can be left or right. R is relation type labeled on the arc. Under the independence assumption that an arc depends on its two nodes we can calculate arc probability given two nodes. In our paper the arc probabilities are calculated as follows:</Paragraph>
      <Paragraph position="4"> Where CTag is coarse-grained part of speech tag and FTag is fine-grained tag. As to Word we choose its lemma if it exists. Dist is the distance between</Paragraph>
      <Paragraph position="6"> . It is divided into four parts: Dist = 1 if j-i = 1 Dist = 2 if j-i = 2 Dist = 3 if 3j-i6 Dist = 4 if j-i &gt; 6  All the probabilities are obtained by maximum likelihood estimation from the training data. Then interpolation smoothing is made to get the final arc probabilities.</Paragraph>
    </Section>
    <Section position="2" start_page="211" end_page="211" type="sub_section">
      <SectionTitle>
2.2 Structure Probabilities
</SectionTitle>
      <Paragraph position="0"> Structure information plays the critical role in syntactic analysis. Nevertheless the flexibility of syntactic structures and data sparseness pose obstacles to us. Especially some structures are related to specific language and cannot be employed in multi-lingual parsing. We have to find those language-independent features.</Paragraph>
      <Paragraph position="1"> In valency theory &amp;quot;valence&amp;quot; represents the number of arguments that a verb is able to govern. In this paper we extend the range of verbs and arguments to all the words. We call the new &amp;quot;valence&amp;quot; Governing Degree (GD), which means the ability of one node governing other nodes. In Figure1, the GD of node &amp;quot;Z!&amp;quot; is 2 and the GDs of two other nodes are 0. The governing degree of nodes in dependency tree often shows directionality. For example, Chinese token &amp;quot;&amp;quot; always governs one left node. Furthermore, we subdivide the GD into Left Governing Degree (LGD) and Right Governing Degree (RGD), which are the ability of words governing their left children or right children. In Figure 1 the LGD and RGD of verb &amp;quot;Z!&amp;quot; are both 1.</Paragraph>
      <Paragraph position="2"> In the paper we use the probabilities of GD over the fine-grained tags. The probabilities of P(LDG|FTag) and P(RGD|FTag) are calculated from training data. Then we only reserve the FTags with large probability because their GDs are stable and helpful to syntactic analysis. Other FTags with small probabilities are unstable in GDs and cannot provide efficient information for syntactic analysis.</Paragraph>
      <Paragraph position="3"> If their probabilities are less than 0.65 they will be ignored in our dependency parsing.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="211" end_page="213" type="metho">
    <SectionTitle>
3 Dynamic local optimization
</SectionTitle>
    <Paragraph position="0"> Many previous methods are based on history-based models. Despite many obvious advantages, these methods can be awkward to encode some constrains within their framework (Collins, 2000). Classifiers are good at encoding more features in the deterministic parsing (Yamada and Matsumoto, 2003; Nivre et al., 2004). However, such algorithm often make more probable dependencies be prevented by preceding errors. An example is showed in Figure 2.</Paragraph>
    <Paragraph position="1"> Arc a is a frequent dependency and b is an arc with more probability. Arc b will be prevented by a if the reduce is carried out in order.</Paragraph>
    <Section position="1" start_page="212" end_page="212" type="sub_section">
      <SectionTitle>
3.1 Our algorithm
</SectionTitle>
      <Paragraph position="0"> Our deterministic parsing is based on dynamic local optimization. The algorithm calculates the arc probabilities of two continuous nodes, and then reduces the most probable arc. The construction of dependency tree includes four actions: Check, Reduce, Delete, and Insert. Before a node is reduced, the Check procedure is made to validate its correctness.</Paragraph>
      <Paragraph position="1"> Only if the arc passes the Check procedure it can be reduced. Otherwise the Reduce will be delayed.</Paragraph>
      <Paragraph position="2"> Delete and Insert are then carried out to adjust the changed arcs. The complete algorithm is depicted as follows:</Paragraph>
      <Paragraph position="4"> The algorithm has following advantages: * Projectivity can be guaranteed. The node is only reduced with its neighboring node. If a node is reduced as a leaf it will be removed from the sentence and doesn't take part in next  highest probability if it passes the Check. No any limitation on order thus the spread of errors can be mitigated effectively.</Paragraph>
      <Paragraph position="5"> * Check is an open process. Various constrains can be encoded in this process. Structural constrains, partial parsed information or language-dependent knowledge can be added.</Paragraph>
      <Paragraph position="6"> Adjustment is illustrated in Figure 3, where &amp;quot; &amp;quot;&amp;quot; is reduced and arc R' is deleted. Then the algorithm computes the arc probability of R&amp;quot; and inserts it to the Stack.</Paragraph>
    </Section>
    <Section position="2" start_page="212" end_page="213" type="sub_section">
      <SectionTitle>
3.2 Checking
</SectionTitle>
      <Paragraph position="0"> The information in parsing falls into two kinds: static and dynamic. The arc probabilities in 2.1 describe the static information which is not changed in parsing. They are obtained from the training data in advance. The structure probabilities in 2.2 describe the dynamic information which varies in the process of parsing. The use of dynamic information often depends on what current dependency tree is.</Paragraph>
      <Paragraph position="1"> Besides the governing degree, Check procedure also uses another dynamic information-Sequential Dependency. Whether current arc can be reduced is relating to previous arc. In Figure 3 the reduce of the arc R depends on the arc R'.IfR' has been delayed or its probability is little less than that of R, arc R will be delayed.</Paragraph>
      <Paragraph position="2"> If the arc doesn't pass the Check it will be delayed. The delayed time ranges from 1 to Length which is the length of sentence. If the arc is delayed Length times it will be blocked. The Reduce will be delayed in the following cases:</Paragraph>
      <Paragraph position="4"> * P(R') &gt;lP(R), the current arc R will be delayed Length*(P(R')/P(R)) times. R' is the preceding arc and l = 0.60.</Paragraph>
      <Paragraph position="5"> * If arc R' is blocking, the arc R will be delayed. hatwidest GD is empirical value and GD is current value.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML