File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0826_metho.xml

Size: 9,021 bytes

Last Modified: 2025-10-06 14:10:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0826">
  <Title>Combining Linguistic Data Views for Phrase-based SMT</Title>
  <Section position="3" start_page="0" end_page="146" type="metho">
    <SectionTitle>
2 System Description
</SectionTitle>
    <Paragraph position="0"> The LDV-COMBO system follows the SMT architecture suggested by the workshop organizers.</Paragraph>
    <Paragraph position="1"> First, training data are linguistically annotated for the two languages involved (See subsection 2.1).</Paragraph>
    <Paragraph position="2"> 10 different data views have been built. Notice that it is not necessary that the two parallel counterparts of a bitext share the same data view, as  long as they share the same granularity. However, in all our experiments we have annotated both sides with the same linguistic information. See token descriptions: (W) word, (WL) word and lemma, (WP) word and PoS, (WC) word and chunk label, (WPC) word, PoS and chunk label, (Cw) chunk of words (Cwl), chunk of words and lemmas, (Cwp) chunk of words and PoS (Cwc) chunk of words and chunk labels (Cwpc) chunk of words, PoS and chunk labels. By chunk label we refer to the IOB label associated to every word inside a chunk, e.g. 'IB[?]NP declareB[?]VP resumedI[?]VP theB[?]NP sessionI[?]NP ofB[?]PP theB[?]NP EuropeanI[?]NP ParliamentI[?]NP .O'). We build chunk tokens by explicitly connecting words in the same chunk, e.g.</Paragraph>
    <Paragraph position="3"> '(I)NP (declare resumed)VP (the session)NP (of)PP (the European Parliament)NP'. See examples of some of these data views in Table 1.</Paragraph>
    <Paragraph position="4"> Then, running GIZA++, we obtain token alignments for each of the data views. Combined phrase-based translation models are built on top of the Viterbi alignments output by GIZA++. See details in subsection 2.2. Combo-models must be then post-processed in order to remove the additional linguistic annotation and split chunks back into words, so they fit the format required by Pharaoh.</Paragraph>
    <Paragraph position="5"> Moreover, we have used the Multilingual Central Repository (MCR), a multilingual lexical-semantic database (Atserias et al., 2004), to build a word-based translation model. We back-off to this model in the case of unknown words, with the goal of improving system recall. See subsection 2.3.</Paragraph>
    <Section position="1" start_page="145" end_page="145" type="sub_section">
      <SectionTitle>
2.1 Data Representation
</SectionTitle>
      <Paragraph position="0"> In order to achieve robustness the same tools have been used to linguistically annotate both languages.</Paragraph>
      <Paragraph position="1"> The SVMTool1 has been used for PoS-tagging (Gim'enez and M`arquez, 2004). The Freeling2 package (Carreras et al., 2004) has been used for lemmatizing. Finally, the Phreco software by (Carreras et al., 2005) has been used for shallow parsing.</Paragraph>
      <Paragraph position="2"> No additional tokenization or pre-processing steps other than case lowering have been performed.</Paragraph>
      <Paragraph position="3"> Special treatment of named entities, dates, numbers,  currency, etc., should be considered so as to further enhance the system.</Paragraph>
    </Section>
    <Section position="2" start_page="145" end_page="145" type="sub_section">
      <SectionTitle>
2.2 Building Combined Translation Models
</SectionTitle>
      <Paragraph position="0"> Because data views capture different, possibly complementary, aspects of the translation process it seems reasonable to combine them. We consider two different ways of building such combo-models: LPHEX Local phrase extraction. To build a separate phrase-based translation model for each data view alignment, and then combine them. There are two ways of combining translation models: MRG Merging translation models. We work on a weighted linear interpolation of models.</Paragraph>
      <Paragraph position="1"> These weights may be tuned, although a uniform weight selection yields good results. Additionally, phrase-pairs may be filtered out by setting a score threshold.</Paragraph>
      <Paragraph position="2"> noMRG Passing translation models directly to the Pharaoh decoder. However, we encountered many problems with phrase-pairs that were not seen in all single models. This obliged us to apply arbitrary smoothing values to score these pairs.</Paragraph>
      <Paragraph position="3"> GPHEX Global phrase extraction. To build a single phrased-based translation model from the union of alignments from several data views.</Paragraph>
      <Paragraph position="4"> In its turn, any MRG operation performed on a combo-model results again in a valid combo-model. In any case, phrase extraction3 is performed as depicted by (Och, 2002).</Paragraph>
    </Section>
    <Section position="3" start_page="145" end_page="146" type="sub_section">
      <SectionTitle>
2.3 Using the MCR
</SectionTitle>
      <Paragraph position="0"> Outer knowledge may be supplied to the Pharaoh decoder by annotating the input with alternative translation options via XML-markup. We enrich every unknown word by looking up every possible translation for all of its senses in the MCR. These are scored by relative frequency according to the number of senses that lexicalized in the same manner. Let wf, pf be the source word and PoS, and we be the target word, we define a function 3We always work with the union of alignments, no heuristic refinement, and phrases up to 5 tokens. Phrase pairs appearing only once have been discarded. Scoring is performed by relative frequency. No smoothing is applied.</Paragraph>
      <Paragraph position="2"> Scount(wf,pf,we) which counts the number of senses for (wf,pf) which can lexicalize as we. A translation pair is scored as:</Paragraph>
      <Paragraph position="4"> Better results would be expected working with word sense disambiguated text. We are not at this point yet. A first approach could be to work with the most frequent sense heuristic.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="146" end_page="147" type="metho">
    <SectionTitle>
3 Experimental Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="146" end_page="146" type="sub_section">
      <SectionTitle>
3.1 Data and Evaluation Metrics
</SectionTitle>
      <Paragraph position="0"> We have used the data sets and language model provided by the organization. No extra training or development data were used in our experiments.</Paragraph>
      <Paragraph position="1"> We evaluate results with 3 different metrics: GTM F1-measure (e = 1,2), BLEU score (n = 4) as provided by organizers, and NIST score (n = 5).</Paragraph>
    </Section>
    <Section position="2" start_page="146" end_page="147" type="sub_section">
      <SectionTitle>
3.2 Experimenting with Data Views
</SectionTitle>
      <Paragraph position="0"> Table 2 presents MT results for the 10 elementary data views devised in Section 2. Default parameters are used for ltm, llm, and lw. No tuning has been performed. As expected, word-based views obtain significatively higher results than chunk-based. All data views at the same level of granularity obtain comparable results.</Paragraph>
      <Paragraph position="1"> In Table 3 MT results for different data view combinations are showed. Merged model weights are set equiprobable, and no phrase-pair score filtering  is performed. We refer to the W model as our baseline. In this view, only words are used. The 5W-MRG and 5W-GPHEX models use a combination of the 5 word-based data views, as in MRG and GPHEX, respectively. The 5C-MRG and 5C-GPHEX system use a combination of the 5 chunk based data views, as in MRG and GPHEX, respectively. The 10-MRG system uses all 10 data views combined as in MRG. The 10-GPHEX/MRG system uses the 5 word based views combined as in GPHEX, the 5 chunk based views combined as in GPHEX, and then a combination of these two combo-models as in MRG.</Paragraph>
      <Paragraph position="2">  It can be seen that results improve by combining several data views. Furthermore, global phrase extraction (GPHEX) seems to work much finer than local phrase extraction (LPHEX).</Paragraph>
      <Paragraph position="3"> Table 4 shows MT results after optimizing ltm, llm, lw, and the weights for the MRG operation, by means of the Downhill Simplex Method in Multidimensions (William H. Press and Flannery, 2002). Observe that tuning the system improves the performance considerably. The lw parameter is particularly sensitive to tuning.</Paragraph>
      <Paragraph position="4"> Even though the performance of chunk-based models is poor, the best results are obtained by combinining the two levels of abstraction, thus proving that syntactically motivated phrases may help. 10-MRG and 10-GPHEX models achieve a similar performance. The 10-MRG-bestWN system corresponds to the 10-MRG model using WordNet. The 10-MRGsubWN system is this same system at the time of submission. Results using WordNet, taking into account that the number of unknown4 words in the development set was very small, are very promising.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML