File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2934_intro.xml
Size: 6,689 bytes
Last Modified: 2025-10-06 14:04:09
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2934"> <Title>Multi-lingual Dependency Parsing with Incremental Integer Linear Programming</Title> <Section position="4" start_page="0" end_page="227" type="intro"> <SectionTitle> 2 Model </SectionTitle> <Paragraph position="0"> Our model is based on the linear model presented in McDonald et al. (2005a),</Paragraph> <Paragraph position="2"> where x is a sentence, y a parse and s a score function over sentence-parse pairs. f (i,j) is a multidi- null mensional feature vector representation of the edge from token i to token j and w the corresponding weight vector. Decoding in this model amounts to finding the y for a given x that maximises s(x,y)</Paragraph> <Paragraph position="4"> and y contains no cycles, attaches exactly one head to each non-root token and no head to the root node.</Paragraph> <Section position="1" start_page="226" end_page="227" type="sub_section"> <SectionTitle> 2.1 Decoding </SectionTitle> <Paragraph position="0"> Instead of using the MST algorithm (McDonald et al., 2005b) to maximise equation 1, we present an equivalent ILP formulation of the problem. An advantage of a general purpose inference technique is the addition of further linguistically motivated constraints. For instance, we can add constraints that enforce that a verb can not have more than one sub-ject argument or that coordination arguments should have compatible types. Roth and Yih (2005) is similarly motivated and uses ILP to deal with additional hard constraints in a Conditional Random Field model for Semantic Role Labelling.</Paragraph> <Paragraph position="1"> There are several explicit formulations of the MST problem as integer programs in the literature (Williams, 2002). They are based on the concept of eliminating subtours (cycles), cuts (disconnections) or requiring intervertex flows (paths). However, in practice these cause long solving times. While the first two types yield an exponential number of constraints, the latter one scales cubically but produces non-fractional solutions in its relaxed version, causing long runtime of the branch and bound algorithm.</Paragraph> <Paragraph position="2"> In practice solving models of this form did not converge after hours even for small sentences.</Paragraph> <Paragraph position="3"> To get around this problem we followed an incremental approach akin to Warme (1998). Instead of adding constraints that forbid all possible cycles in advance (this would result in an exponential number of constraints) we first solve the problem without any cycle constraints. Only if the result contains cycles we add constraints that forbid these cycles and run the solver again. This process is repeated until no more violated constraints are found. Figure 1 shows this algorithm.</Paragraph> <Paragraph position="4"> Groetschel et al. (1981) showed that such an approach will converge after a polynomial number of iterations with respect to the number of variables.</Paragraph> <Paragraph position="5"> 1. Solve IP Pi 2. Find violated constraints C in the solution of Pi 3. if C = [?] we are done 4. Pi+1 = Pi [?] C 5. i = i +1 6. goto (1) In practice, this technique showed fast convergence (less than 10 iterations) in most cases, yielding solving times of less than 0.5 seconds. However, for some sentences in certain languages, such as Chinese or Swedish, an optimal solution could not be found after 500 iterations.</Paragraph> <Paragraph position="6"> In the following section we present the bjective function, variables and linear constraints that make up the Integer Linear Program.</Paragraph> <Paragraph position="7"> In the implementation1 of McDonald et al.</Paragraph> <Paragraph position="8"> (2005b) dependency labels are handled by finding the best scoring label for a given token pair so that s(i,j) = max s(i,j,label) goes into Equation 1. This is only exact as long as no further constraints are added. Since our aim is to add constraints our variables need to explicitly model label decisions. Therefore, we introduce binary variables null li,j,label[?]i [?] 0..n,j [?] 1..n,label [?] bestb (i,j) where n is the number of tokens and the index 0 represents the root token. bestb (i,j) is the set of b labels with maximal s(i,j,label). li,j,label equals 1 if there is a dependency with the label label between token i (head) and j (child), 0 otherwise.</Paragraph> <Paragraph position="9"> Furthermore, we introduce binary auxiliary variables null di,j[?]i [?] 0..n,j [?] 1..n representing the existence of a dependency between tokens i and j. We connect these to the li,j,label variables by a constraint</Paragraph> <Paragraph position="11"> Given the above variables our objective function can be represented as</Paragraph> <Paragraph position="13"> with a suitable k.</Paragraph> <Paragraph position="14"> has exactly one head. This yields</Paragraph> <Paragraph position="16"> for the artificial root node.</Paragraph> <Paragraph position="17"> Typed Arity Constraints We might encounter solutions of the basic model that contain, for instance, verbs with two subjects. To forbid these we simply augment our model with constraints such as</Paragraph> <Paragraph position="19"> for all verbs i in a sentence.</Paragraph> <Paragraph position="20"> nation conjuncts have to be of compatible types. For example, nouns can not coordinate with verbs. We implemented this constraint by checking the parses for occurrences of incompatible arguments. If we find two arguments j,k for a conjunction i: di,j and di,k and j is a noun and k is a verb then we add di,j + di,k [?] 1 to forbid configurations in which both dependencies are active.</Paragraph> <Paragraph position="21"> Projective Parsing In the incremental ILP framework projective parsing can be easily implemented by checking for crossing dependencies after each iteration and forbidding them in the next. If we see two dependencies that cross, di,j and dk,l, we add the constraint di,j + dk,l [?] 1 to prevent this in the next iteration. This can also be used to prevent specific types of crossings. For instance, in Dutch we could only allow crossing dependencies as long as none of the dependencies is a &quot;Determiner&quot; relation.</Paragraph> </Section> <Section position="2" start_page="227" end_page="227" type="sub_section"> <SectionTitle> 2.2 Training </SectionTitle> <Paragraph position="0"> We used single-best MIRA(Crammer and Singer, 2003).For all experiments we used 10 training iterations and non-projective decoding. Note that we used the original spanning tree algorithm for decoding during training as it was faster.</Paragraph> </Section> </Section> class="xml-element"></Paper>