File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/03/w03-0302_abstr.xml

Size: 3,518 bytes

Last Modified: 2025-10-06 13:42:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0302">
  <Title>ProAlign: Shared Task System Description</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> ProAlign combines several different approaches in order to produce high quality word word alignments. Like competitive linking, ProAlign uses a constrained search to find high scoring alignments. Like EM-based methods, a probability model is used to rank possible alignments. The goal of this paper is to give a bird's eye view of the ProAlign system to encourage discussion and comparison.</Paragraph>
    <Paragraph position="1"> 1 Alignment Algorithm at a Glance We have submitted the ProAlign alignment system to the WPT'03 shared task. It received a 5.71% AER on the English-French task and 29.36% on the Romanian-English task. These results are with the no-null data; our output was not formatted to work with explicit nulls. ProAlign works by iteratively improving an alignment. The algorithm creates an initial alignment using search, constraints, and summed ph2 correlation-based scores (Gale and Church, 1991). This is similar to the competitive linking process (Melamed, 2000). It then learns a probability model from the current alignment, and conducts a constrained search again, this time scoring alignments according to the probability model. The process continues until results on a validation set begin to indicate over-fitting.</Paragraph>
    <Paragraph position="2"> For the purposes of our algorithm, we view an alignment as a set of links between the words in a sentence pair. Before describing the algorithm, we will define the following notation. Let E be an English sentence e1,e2,...,em and let F be a French sentence f1,f2,...,fn. We define a link l(ei,fj) to exist if ei and fj are a translation (or part of a translation) of one another. We define the null link l(ei,f0) to exist if ei does not correspond to a translation for any French word in F. The null link l(e0,fj) is defined similarly. An alignment A for two sentences E and F is a set of links such that every word in E and F participates in at least one link, and a word linked to e0 or f0 participates in no other links. If e occurs in E x times and f occurs in F y times, we say that e and f co-occur xy times in this sentence pair.</Paragraph>
    <Paragraph position="3"> ProAlign conducts a best-first search (with constant beam and agenda size) to search a constrained space of possible alignments. A state in this space is a partial alignment, and a transition is defined as the addition of a single link to the current state. Any link which would create a state that does not violate any constraint is considered to be a valid transition. Our start state is the empty alignment, where all words in E and F are implicitly linked to null. A terminal state is a state in which no more links can be added without violating a constraint. Our goal is to find the terminal state with the highest probability. null To complete this algorithm, one requires a set of constraints and a method for determining which alignment is most likely. These are presented in the next two sections. The algorithm takes as input a set of English-French sentence pairs, along with dependency trees for the English sentences. The presence of the English dependency tree allows us to incorporate linguistic features into our model and linguistic intuitions into our constraints.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML