File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-1015_metho.xml

Size: 3,964 bytes

Last Modified: 2025-10-06 14:10:13

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1015">
  <Title>Word Alignment via Quadratic Assignment</Title>
  <Section position="4" start_page="115" end_page="116" type="metho">
    <SectionTitle>
3 Parameter estimation
</SectionTitle>
    <Paragraph position="0"> To estimate the parameters of our model, we follow the large-margin formulation of Taskar (2004).</Paragraph>
    <Paragraph position="1"> Our input is a set of training instances {(xi,yi)}mi=1, where each instance consists of a sentence pair xi and a target alignment yi. We would like to find parameters w that predict correct alignments on the training data: yi = arg max -yi[?]Yi wlatticetopf(xi, -yi) for each i, where Yi is the space of matchings for the sentence pair xi.</Paragraph>
    <Paragraph position="2"> In standard classification problems, we typically measure the error of prediction, lscript(yi, -yi), using the simple 0-1 loss. In structured problems, where we are jointly predicting multiple variables, the loss is often more complex. While the F-measure is a natural loss function for this task, we instead chose a sensible surrogate that fits better in our framework: weighted Hamming distance, which counts the number of variables in which a candidate solution -y differs from the target output y, with different penalty for false positives (c+) and false negatives (c[?]):</Paragraph>
    <Paragraph position="4"> We use an SVM-like hinge upper bound on the loss lscript(yi, -yi), given by max-yi[?]Yi[wlatticetopfi(-yi) + lscripti(-yi) [?] wlatticetopfi(yi)], where lscripti(-yi) = lscript(yi, -yi), and fi(-yi) = f(xi, -yi). Minimizing this upper bound encourages the true alignment yi to be optimal with respect to w for each instance i:</Paragraph>
    <Paragraph position="6"> where g is a regularization parameter.</Paragraph>
    <Paragraph position="7"> In this form, the estimation problem is a mixture of continuous optimization over w and combinatorial optimization over yi. In order to transform it into a more standard optimization problem, we need a way to efficiently handle the loss-augmented inference, max-yi[?]Yi[wlatticetopfi(-yi) + lscripti(-yi)]. This optimization problem has precisely the same form as the prediction problem whose parameters we are trying to learn -- max-yi[?]Yi wlatticetopfi(-yi) -- but with an additional term corresponding to the loss function. Our assumption that the loss function decomposes over the edges is crucial to solving this problem. We omit the details here, but note that we can incorporate the loss function into the LPs for various models we described above and &amp;quot;plug&amp;quot; them into the large-margin formulation by converting the estimation problem into a quadratic problem (QP) (Taskar, 2004). This QP can be solved using any off-the-shelf solvers, such as MOSEK or CPLEX.2 An important difference that comes into play for the estimation of the quadratic assignment models in Equation (3) is that inference involves solving an integer linear program, not just an LP. In fact the LP is a relaxation of the integer LP and provides an upper bound on the value of the highest scoring assignment. Using the LP relaxation for the large-margin QP formulation is an approximation, but as our experiments indicate, this approximation is very effective. At testing time, we use the integer LP to predict alignments. We have also experimented with using just the LP relaxation at testing time and then independently rounding each fractional edge value, which actually incurs no loss in alignment accuracy, as we discuss below.</Paragraph>
    <Paragraph position="8"> 2When training on 200 sentences, the QP we obtain contains roughly 700K variables and 300K constraints and is solved in roughly 10 minutes on a 2.8 GHz Pentium 4 machine. Aligning the whole training set with the flow formulation takes a few seconds, whereas using the integer programming (for the QAP formulation) takes 1-2 minutes.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML