File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0825_intro.xml

Size: 1,824 bytes

Last Modified: 2025-10-06 14:03:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0825">
  <Title>A Generalized Alignment-Free Phrase Extraction</Title>
  <Section position="3" start_page="0" end_page="141" type="intro">
    <SectionTitle>
2 Blocks
</SectionTitle>
    <Paragraph position="0"> We consider each phrase pair as a block within a given parallel sentence pair, as shown in Figure 1.</Paragraph>
    <Paragraph position="1"> The y-axis is the source sentence, indexed word by word from bottom to top; the x-axis is the target sentence, indexed word by word from left to right.</Paragraph>
    <Paragraph position="2"> The block is de ned by the source phrase and its projection. The source phrase is bounded by the start and the end positions in the source sentence. The projection of the source phrase is de ned as the left and right boundaries in the target sentence. Usually, the boundaries can be inferred according to word alignment as the left most and right most aligned positions from the words in the source phrase. In  this paper, we provide another view of the block, which is de ned by the centers of source and target phrases, and the width of the target phrase.</Paragraph>
    <Paragraph position="3"> Phrase extraction algorithms in general search for the left and right projected boundaries of each source phrase according to some score metric computed for the given parallel sentence pairs. We present here three models: a phrase level fertility model score for phrase pairs' length mismatch, a simple center-based distortion model score for the divergence of phrase pairs' relative positions, and a phrase level translation score to approximate the phrase pairs' translational equivalence. Given a source phrase, we can search for the best possible block with the highest combined scores from the three models.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML