File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2925_intro.xml

Size: 5,292 bytes

Last Modified: 2025-10-06 14:04:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2925">
  <Title>Projective Dependency Parsing with Perceptron</Title>
  <Section position="4" start_page="0" end_page="181" type="intro">
    <SectionTitle>
2 Parsing and Learning Algorithms
</SectionTitle>
    <Paragraph position="0"> This section describes the three main components of the dependency parsing: the parsing model, the parsing algorithm, and the learning algorithm.</Paragraph>
    <Section position="1" start_page="0" end_page="181" type="sub_section">
      <SectionTitle>
2.1 Model
</SectionTitle>
      <Paragraph position="0"> Let 1,...,L be the dependency labels, defined beforehand. Let x be a sentence of n words, x1 ...xn.</Paragraph>
      <Paragraph position="1"> Finally, let Y(x) be the space of well-formed dependency trees for x. A dependency tree y [?] Y(x) is a set of n dependencies of the form [h,m,l], where h is the index of the head word (0 [?] h [?] n, where 0 means root), m is the index of the modifier word (1 [?] m [?] n), and l is the dependency label (1 [?] l [?] L). Each word of x participates as a modifier in exactly one dependency of y.</Paragraph>
      <Paragraph position="2"> Our dependency parser, dp, returns the maximum scored dependency tree for a sentence x:</Paragraph>
      <Paragraph position="4"> In the formula, w is the weight vector of the parser, that is, the set of parameters used to score dependencies during the parsing process. It is formed by a concatenation of L weight vectors, one for each dependency label, w = (w1,...,wl,...,wL). We assume a feature extraction function, ph, that represents an unlabeled dependency [h,m] in a vector of D features. Each of the wl has D parameters or dimensions, one for each feature. Thus, the global  weight vector w maintains L x D parameters. The scoring function is defined as follows: sco([h,m,l],x,y,w) = ph(h,m,x,y)* wl Note that the scoring of a dependency makes use of y, the tree that contains the dependency. As described next, at scoring time y just contains the dependencies found between h and m.</Paragraph>
    </Section>
    <Section position="2" start_page="181" end_page="181" type="sub_section">
      <SectionTitle>
2.2 Parsing Algorithm
</SectionTitle>
      <Paragraph position="0"> We use the cubic-time algorithm for dependency parsing proposed by Eisner (1996; 2000). This parsing algorithm assumes that trees are projective, that is, dependencies never cross in a tree. While this assumption clearly does not hold in the CoNLL-X data (only Chinese trees are actually 100% projective), we chose this algorithm for simplicity. As it will be shown, the percentage of non-projective dependencies is not very high, and clearly the error rates we obtain are caused by other major factors.</Paragraph>
      <Paragraph position="1"> The parser is a bottom-up dynamic programming algorithm that visits sentence spans of increasing length. In a given span, from word s to word e, it completes two partial dependency trees that cover all words within the span: one rooted at s and the other rooted at e. This is done in two steps. First, the optimal dependency structure internal to the span is chosen, by combining partial solutions from internal spans. This structure is completed with a dependency covering the whole span, in two ways: from s to e, and from e to s. In each case, the scoring function is used to select the dependency label that maximizes the score.</Paragraph>
      <Paragraph position="2"> We take advantage of this two-step processing to introduce features for the scoring function that represent some of the internal dependencies of the span (see Section 3 for details). It has to be noted that the parsing algorithm we use does not score dependencies on top of every possible internal structure. Thus, by conditioning on features extracted from y we are making the search approximative.</Paragraph>
    </Section>
    <Section position="3" start_page="181" end_page="181" type="sub_section">
      <SectionTitle>
2.3 Perceptron Learning
</SectionTitle>
      <Paragraph position="0"> As learning algorithm, we use Perceptron tailored for structured scenarios, proposed by Collins (2002).</Paragraph>
      <Paragraph position="1"> In recent years, Perceptron has been used in a number of Natural Language Learning works, such as in</Paragraph>
      <Paragraph position="3"> parameter that indicates the number of epochs that the algorithm cycles the training set.</Paragraph>
      <Paragraph position="4"> partial parsing (Carreras et al., 2005) or even dependency parsing (McDonald et al., 2005).</Paragraph>
      <Paragraph position="5"> Perceptron is an online learning algorithm that learns by correcting mistakes made by the parser when visiting training sentences. The algorithm is extremely simple, and its cost in time and memory is independent from the size of the training corpora.</Paragraph>
      <Paragraph position="6"> In terms of efficiency, though, the parsing algorithm must be run at every training sentence.</Paragraph>
      <Paragraph position="7"> Our system uses the regular Perceptron working in primal form. Figure 1 sketches the code. Given the number of languages and dependency types in the CoNLL-X exercise, we found prohibitive to work with a dual version of Perceptron, that would allow the use of a kernel function to expand features.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML