File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0404_metho.xml

Size: 34,248 bytes

Last Modified: 2025-10-06 14:09:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0404">
  <Title>Translation by Machine of Complex Nominals: Getting it Right</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
BNC Reuters Mainichi
</SectionTitle>
    <Paragraph position="0"> Token coverage 2.6% 3.9% 2.9% Total no. types 265K 166K 889K Ave. token freq. 4.2 12.7 11.1  basic study of corpus occurrence in English and Japanese. For English, we based our analysis over: (1) the written portion of the British National Corpus (BNC, 84M words: Burnard (2000)), and (2) the Reuters corpus (108M words: Rose et al. (2002)). For Japanese, we focused exclusively on the Mainichi Shimbun Corpus (340M words: Mainichi Newspaper Co. (2001)). We identified NN compounds in each corpus using the method described in a45 2.2 below, and from this, derived the statistics of occurrence presented in Table 1. The token coverage of NN compounds in each corpus refers to the percentage of words which are contained in NN compounds; based on our corpora, we estimate this figure to be as high as 3-5%. If we then look at the average token frequency of each distinct NN compound type, we see that it is a relatively modest figure given the size of each of the corpora, the reason for which is seen in the huge number of distinct NN compound types. Combining these observations, we see that a translator or MT system attempting to translate one of these corpora will run across NN compounds with high frequency, but that each individual NN compound will occur only a few times (with around 45-60% occuring only once). The upshot of this for MT systems and translators is that NN compounds are too varied to be able to pre-compile an exhaustive list of translated NN compounds, and must instead be able to deal with novel NN compounds on the fly. This claim is supported by Tanaka and Baldwin (2003a), who found that static bilingual dictionaries had a type coverage of around 84% and 94% over the top250 most frequent English and Japanese NN compounds, respectively, but only 27% and 60%, respectively, over a random sample of NN compounds occurring more than 10 times in the corpus.</Paragraph>
    <Paragraph position="1"> We develop and test a method for translating NN compounds based on Japanesea0 English MT. The method can act as a standalone module in an MT Second ACL Workshop on Multiword Expressions: Integrating Processing, July 2004, pp. 24-31 system, translating NN compounds according to the best-scoring translation candidate produced by the method, and it is primarly in this context that we present and evaluate the method. This is congruent with the findings of Koehn and Knight (2003) that, in the context of statistical MT, overall translation performance improves when source language noun phrases are prescriptively translated as noun phrases in the target language. Alternatively, the proposed method can be used to generate a list of plausible translation candidates for each NN compound, for a human translator or MT system to select between based on the full translation context.</Paragraph>
    <Paragraph position="2"> In the remainder of the paper, we describe the translation procedure and resources used in this research (a45 2), and outline the translation candidate selection method, a benchmark selection method and pre-processors our method relies on (a45 3). We then evaluate the method using a variety of data sources (a45 4), and finally compare our method to related research (a45 5).</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Preliminaries
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Translation procedure
</SectionTitle>
      <Paragraph position="0"> We translate NN compounds by way of a two-phase procedure, incorporating generation and selection (similarly to Cao and Li (2002) and Langkilde and Knight (1998)).</Paragraph>
      <Paragraph position="1"> Generation consists of looking up word-level translations for each word in the NN compound to be translated, and running them through a set of constructional translation templates to generate translation candidates. In order to translate a22 a23 a3 a24 a26 kaNkeia3kaizeN &amp;quot;improvement in relations&amp;quot;, for example, possible word-level translations for a22 a23 are relation, connection and relationship, and translations for a24 a26 are improvement and betterment. Constructional templates are of the form [Na1</Paragraph>
      <Paragraph position="3"> (where Na1a2 indicates that the word is a noun (N) in English (a3 ) and corresponds to the a4 th-occurring noun in the original Japanese; see Table 3 for further example templates and Kageura et al. (2004) for discussion of templates of this type). Each slot in the translation template is indexed for part of speech (POS), and derivational morphology is optionally used to convert a given word-level translation into a form appropriate for a given template. Example translation candidates for a22 a23 a3 a24 a26 , therefore, are relation improvement, betterment of relationship, improvement connection and relational betterment.</Paragraph>
      <Paragraph position="4"> Generation fails in the instance that we are unable to find a word-level translation for Na9 and/or Na10 .</Paragraph>
      <Paragraph position="5"> Selection consists of selecting the most likely translation for the original NN compound from the generated translation candidates. Selection is performed based on a combination of monolingual target language and crosslingual evidence, obtained from corpus or web data.</Paragraph>
      <Paragraph position="6"> Ignoring the effects of POS constraints for the moment, the number of generated translations is a5a7a6a9a8a11a10a13a12a15a14 where a8 and a10 are the fertility of Japanese</Paragraph>
      <Paragraph position="8"> , respectively, and a12 is the number of translation templates. As a result, there is often a large number of translation candidates to select between, and the selection method crucially determines the efficacy of the method.</Paragraph>
      <Paragraph position="9"> This translation procedure has the obvious advantage that it can generate a translation for any NN compound input assuming that there are word-level translations for each of the component nouns; that is it has high coverage. It is based on the assumption that NN compounds translate compositionality between Japanese and English, which Tanaka and Baldwin (2003a) found to be the case 43.1% of the time for Japanese-English (JE) MT and 48.7% of the time for English-Japanese (EJ) MT. In this paper, we focus primarily on selecting the correct translation for those NN compounds which can be translated compositionally, but we also investigate what happens when non-compositional NN compounds are translated using a compositional method.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Translation data
</SectionTitle>
      <Paragraph position="0"> In order to generate English and Japanese NN compound testdata, we first extracted out all NN bi-grams from the Reuters Corpus and Mainichi Shimbun Corpus. The Reuters Copus was first tagged and chunked using fnTBL (Ngai and Florian, 2001), and lemmatised using morph (Minnen et al., 2001), while the Mainichi Shimbun was segmented and tagged using ChaSen (Matsumoto et al., 1999). For both English and Japanese, we took only those NN bigrams adjoined by non-nouns to ensure that they were not part of a larger compound nominal. We additionally measured the entropy of the left and right contexts for each NN type, and filtered out all compounds where either entropy value was a17a19a18 .2 This was done in an attempt to, once again, exclude NNs which were embedded in larger MWEs, such as service department in social service department.</Paragraph>
      <Paragraph position="1"> We next calculated the frequency of occurrence of each NN compound type identified in the English and Japanese corpora, and ranked the NN compound types in order of corpus frequency. Based on this ranking, we split the NN compound types into three partitions of equal token frequency, and from each partition, randomly selected 250 NN compounds. In doing so, we produced NN compound 2For the left token entropy, if the most-probable left context was the, a or a sentence boundary, the threshold was switched off. Similarly for the right token entropy, if the most-probable right context was a punctuation mark or sentence boundary, the threshold was switched off.</Paragraph>
      <Paragraph position="2">  data representative of three disjoint frequency bands of equal token size, as detailed in Table 2. This allows us to analyse the robustness of our method over data of different frequencies.</Paragraph>
      <Paragraph position="3"> Our motivation in testing the proposed method over NN compounds according to the three frequency bands is to empirically determine: (a) whether there is any difference in translation-compositionality for NN compounds of different frequency, and (b) whether our method is robust over NN compounds of different frequency. We return to these questions in a45 4.1.</Paragraph>
      <Paragraph position="4"> In order to evaluate basic translation accuracy over the test data, we generated a unique gold-standard translation for each NN compound to represent its optimally-general default translation.</Paragraph>
      <Paragraph position="5"> This was done with reference to two bilingual Japanese-English dictionaries: the ALTDIC dictionary and the on-line EDICT dictionary. The ALTDIC dictionary was compiled from the ALT-J/E MT system (Ikehara et al., 1991), and has approximately 400,000 entries including more than 200,000 proper nouns; EDICT (Breen, 1995) has approximately 150,000 entries. The existence of a translation for a given NN compound in one of the dictionaries does not guarantee that we used it as our gold-standard, and 35% of JE translations and 25% of EJ translations were rejected in favour of a manually-generated translation. In generating the gold-standard translation data, we checked the validity of each of the randomly-extracted NN compounds, and rejected a total of 0.5% of the initial random sample of Japanese strings, and 6.6% of the English strings, on the grounds of: (1) not being NN compounds, (2) being proper nouns, or (3) being part of a larger MWE. In each case, the rejected string was replaced with an alternate randomly-selected NN compound.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Translation templates
</SectionTitle>
      <Paragraph position="0"> The generation phase of translation relies on translation templates to recast the source language NN compound into the target language. The translation templates were obtained by way of word alignment over the JE and EJ gold-standard translation datasets, generating a total of 28 templates for the JE task and 4 templates for the EJ task. The reason for the large number of templates in the JE task is that they are used to introduce prepositions and possessive markers, as well as indicating word class conversions (see Table 3).</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Selection methodology
</SectionTitle>
    <Paragraph position="0"> In this section, we describe a benchmark selection method based on monoligual corpus data, and a novel selection method combining monolingual corpus data and crosslingual data derived from bilingual dictionaries. Each method takes the list of generated translation candidates and scores each, returning the highest-scoring translation candidate as our final translation.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Benchmark monolingual method
</SectionTitle>
      <Paragraph position="0"> The monolingual selection method we benchmark ourselves against is the corpus-based translation quality (CTQ) method of Tanaka and Baldwin (2003b). It rates a given translation candidate according to corpus evidence for both the fully-specified translation and its parts in the context of the translation template in question. This is calcu-</Paragraph>
      <Paragraph position="2"> are the word-level translations of the source language Na35 a9</Paragraph>
      <Paragraph position="4"> and a12 is the translation template.4 Each probability is calculated according to a maximum likelihood estimate based on relative corpus occurrence. The formulation of CTQ is based on linear interpolation over a38 and a39 , where a40a15a41a42a38a44a43a45a39a46a41 a18 and a38a48a47a49a39a51a50 a18 . We set a38 to a40a53a52a31a54 and a39 to a40a55a52 a18 throughout evaluation. The basic intuition behind decomposing the translation candidate into its two parts within the context of the translation template (a56 a6 a34 a35 a10</Paragraph>
      <Paragraph position="6"> were Bandersnatch and relation, respectively, and a56 a6 a34a28a35 a10</Paragraph>
      <Paragraph position="8"> would hope to score relation to (the) Bandersnatch as being more likely than relation on (the) Bandersnatch. We could hope to achieve this by virtue of the fact that relation occurs in the form relation to ... much more frequently than relation on ..., making the value of a56</Paragraph>
      <Paragraph position="10"> there is considerable variability in their applicatility. One example of this is the simplex a42a44a43 kiji which is translated as either article or item (in the sense of a newspaper) in ALTDIC, of which the former is clearly the more general translation. Lacking knowledge of this conditional probability, the method considers the two translations to be equally probable, giving rise to the preferred translation of related item for a22a46a45  a42a47a43 kaNreNa3kiji &amp;quot;related article&amp;quot; due to the markedly greater corpus occurrence of related item over related article. It is this aspect of selection that we focus on in our proposed method.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Proposed selection method
</SectionTitle>
      <Paragraph position="0"> The proposed method uses the corpus-based mono-lingual probability terms of CTQ above, but also mono- and crosslingual terms derived from bilingual dictionary data. In doing so, it attempts to preserve the ability of CTQ to model target language expressional preferences, while incorporating more direct translation preferences at various levels of lexical specification. For ease of feature expandability, and to avoid interpolation over excessively many terms, the backbone of the method is the TinySVM support vector machine (SVM) learner.5 The way we use TinySVM is to take all source language inputs where the gold-standard translation is included among the generated translation candidates, and construct a single feature vector for each translation candidate. We treat those feature vectors which correspond to the (unique) gold-standard translation as positive exemplars, and all other feature vectors as negative exemplars. We then run TinySVM over the training exemplars using the ANOVA kernel (the only kernel which was found to converge). Strictly speaking, SVMs produce a binary classification, by returning a continuous value and determining whether it is closest to a47 a18 (the positive class) or a48 a18 (the negative class). We treat this value as a translation quality rating, and rank the translation candidates accordingly. To select the best translation candidate, we simply take the best-scoring exemplar, breaking ties through random selection. null</Paragraph>
      <Paragraph position="2"> The selection method makes use of three basic feature types in generating a feature vector for each source language-translation candidate pair: corpus-based features, bilingual dictionary-based features and template-based features.</Paragraph>
      <Paragraph position="3"> Corpus-based features Each source language-translation pair is mapped onto a total of 8 corpus-based feature types, in line with the CTQ formulation above:</Paragraph>
      <Paragraph position="5"> used to estimate the frequency of occurrence of multiword expression (MWE) translations from that of the head. E.g., in generating translations for a88a63a89 a90 a3 a42a63a91 fudousaNa3gaisha &amp;quot;real estate company&amp;quot;, we get two word-level translations for a88a47a89 a90 : real estate and real property. In each case, we identify the final word as the head, and calculate the number of times the MWEs (i.e. real estate and real property) occur in the overall corpus as compared to the head (i.e. estate and property, respectively).</Paragraph>
      <Paragraph position="6"> In calculating the values of each of the frequency-based features involving these translations, we determine the frequency of the head in the given context, and multiply this by the normalisation parameter. The reason for doing this is for ease of calculation and, wherever possible, to avoid zero values for frequencies involving MWEs. The feature</Paragraph>
      <Paragraph position="8"> are set to 1.0 in the case that the translation is simplex) and intended to model the tendency to prefer simplex translations over MWEs when given a choice.</Paragraph>
      <Paragraph position="9"> We construct an additional feature from each of these values, by normalising (by simple division to generate a value in the range a92a40a55a43 a18a94a93 ) relative to the maximum value for that feature among the translation candidates generated for a given source language input. For each corpus, therefore, the total number of corpus-based features is a95a85a96a98a97 a50 a18a100a99 . In EJ translation, the corpus-based feature values were derived from the Mainichi Shimbun Corpus, whereas in JE translation, we used the BNC and Reuters Corpus, and concatenated the feature values from each.</Paragraph>
      <Paragraph position="10"> Bilingual dictionary-based features Bilingual dictionary data is used to generate 6 features: null</Paragraph>
      <Paragraph position="12"> a14 is the total number of times the given translation candidate occurs as a translation for the source language NN compound across all dictionaries. While this feature may seem to give our method an unfair advantage over CTQ, it is important to realise that only limited numbers of NN compounds are listed in the dictionaries (12% for English and 28% for Japanese), and that the gold-standard accuracy when the dictionary translation is selected is not as high as one would expect (65% for English and 75% for Japanese). a14a1a15 a86a31a17 a19a22a21a13a23 a23a25a24a27a26a32a28 a6 a34 a35 a10</Paragraph>
      <Paragraph position="14"> the total occurrences of the translation candidate across all dictionaries (irrespective of the source language expression it translates), and is considered to be an indication of conventionalisation of the candidate. null The remaining features are intended to capture word-level translation probabilities, optionally in the context of the template used in the translation candidate. Returning to our a22a46a45 a3 a42a47a43 kaNreNa3kiji &amp;quot;related article&amp;quot; example from above, of the translations article and item for a42a47a43 , article occurs as the translation of a42a47a43 for 42% of NN entries with a42a51a43 as the Na10 , and within 18% of translations for complex entries involving a42a47a43 (irrespective of the form or alignment between article and a42a63a43 ). For item, the respective statistics are 9% and 4%. From this, we can conclude that article is the more appropriate translation, particularly for the given translation template.</Paragraph>
      <Paragraph position="15"> As with the corpus-based features, we additionally construct a normalised variant of each feature value, such that the total number of bilingual dictionary-based features is a33 a96 a97 a50 a95 .</Paragraph>
      <Paragraph position="16"> In both JE and EJ translation, we derived bilingual dictionary-based features from the EDICT and ALTDIC dictionaries independently, and concatenated the features derived from each.</Paragraph>
      <Paragraph position="17"> Template-based features We use a total of two template-based features: the template type and the target language head (N1 or N2). For template [Na9 Na10 ]J</Paragraph>
      <Paragraph position="19"> e.g., the template type is N-N and the target language head is N1.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Corpus data
</SectionTitle>
      <Paragraph position="0"> The corpus frequencies were extracted from the same three corpora as were described in a45 1: the BNC and Reuters Corpus for English, and Mainichi Shimbun Corpus for Japanese. We chose to use the BNC and Reuters Corpus because of their complementary nature: the BNC is a balanced corpus and hence has a rounded coverage of NN compounds (see Table 1), whereas the Reuters Corpus contains newswire data which aligns relatively well in content with the newspaper articles in the Mainichi Shimbun Corpus.</Paragraph>
      <Paragraph position="1"> We calculated the corpus frequencies based on the tag and dependency output of RASP (Briscoe and Carroll, 2002) for English, and CaboCha (Kudo and Matsumoto, 2002) for Japanese. RASP is a tag sequence grammar-based stochastic parser which attempts to exhaustively resolve inter-word dependencies in the input. CaboCha, on the other hand, chunks the input into head-annotated &amp;quot;bunsetsu&amp;quot; or base phrases, and resolves only inter-phrase dependencies. We thus independently determined the intra-phrasal structure from the CaboCha output based on POS-conditioned templates.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> We evaluate the method over both JE and EJ translation selection, using the two sets of 750 NN compounds described in a45 2.2. In each case, we first evaluate system performance according to gold-standard accuracy, i.e. the proportion of inputs for which the (unique) gold-standard translation is ranked top amongst the translation candidates. For the method to have a chance at selecting the gold-standard translation, we clearly must be able to generate it. The first step is thus to identify inputs which have translation-compositional gold-standard translations, and generate the translation candidates for each. The translation-compositional data has the distribution given in Table 4. The over-all proportion of translation-compositional inputs is somewhat lower than suggested by Tanaka and Baldwin (2003a), although this is conditional on the coverage of the particular dictionaries we use. The degree of translation-compositionality appears to be relatively constant across the three frequency bands, a somewhat surprising finding as we had expected the lower frequency NN compounds to be less conventionalised and therefore have more straightforwardly compositional translations.</Paragraph>
    <Paragraph position="1"> We use the translation-compositional test data to evaluate the proposed method (SVMa35 a11a37a36a22a38 ) against CTQ and a simple baseline derived from CTQ, which takes the most probable fully-specified translation  candidate (i.e. is equivalent to setting a38 a50 a18 and a39 a50 a40 ). We additionally tested the proposed method using just corpus-based features (SVMa35 a11 ) and bilingual dictionary-based features (SVMa38 ) to get a better sense for the relative impact of each on overall performance. In the case of the proposed method and its derivants, evaluation is according to 10-fold stratified cross-validation, with stratification taking place across the three frequency bands. The average number of translations generated for the JE dataset was 205.6, and that for the EJ dataset was 847.5.</Paragraph>
    <Paragraph position="2"> We were unable to generate any translations for 17 (2.3%) and 57 (7.6%) of the NN compounds in the JE and EJ datasets, respectively, due to there being no word-level translations for Na9 and/or Na10 in the combined ALTDIC/EDICT dictionaries.</Paragraph>
    <Paragraph position="3"> The gold-standard accuracies are presented in Table 5, with figures in boldface indicating a statistically significant improvement over both CTQ and the baseline.6 Except for SVMa38 in the EJ task, all evaluated methods surpass the baseline, and all variants of SVM surpassed CTQ. SVMa35 a11 a36a22a38 appears to successfully consolidate on SVMa35 a11 and SVMa38 , indicating that our modelling of target language corpus and crosslingual data is complementary. Overall, the results for the EJ task are higher than those for the JE task. Part of the reason for this is that Japanese has less translation variability for a given pair of word translations, as discussed below.</Paragraph>
    <Paragraph position="4"> In looking through the examples where a gold-standard translation was not returned by the different methods, we often find that the uniqueness of gold-standard translation has meant that equally good translations (e.g. dollar note vs. the gold-standard translation dollar bill for a8a10a9 a3a12a11 a13 dorua3shihei) or marginally lower-quality but perfectly acceptable translations (e.g. territorial issue vs. the gold-standard translation of territorial dispute for a14a16a15 a3a18a17a20a19 ryoudoa3moNdai) are adjudged incorrect. To rate the utility of these near-miss translations, we rated each non-gold-standard firstranking translation according to source languagerecoverability (L1-recoverability). L1-recoverable  frequency bands translations are defined to be syntactically unmarked, capture the basic semantics of the source language expression and allow the source language expression to be recovered with reasonable confidence. While evaluation of L1-recoverability is inevitably subjective, we minimise bias towards any given system by performing the L1-recoverability annotation for all methods in a single batch, without giving the annotator any indication of which method selected which translation. The average number of English and Japanese L1-recoverable translations were 1.9 and 0.94, respectively. The principle reason for the English data being more forgiving is the existence of possessive- and PP-based paraphrases of NN gold-standard translations (e.g. ammendment of rule(s) as an L1-recoverable paraphrase of rule ammendment).</Paragraph>
    <Paragraph position="5"> We combine the gold-standard data and L1-recoverable translation data together into a single silver standard translation dataset, based upon which we calculate silver-standard translation accuracy. The results for the translation-compositional data are given in Table 6. Once again, we find that the proposed method is superior to the base-line and CTQ, and that the combination of crosslingual and target language corpus data is superior to the individual data sources. SVMa38 fares particularly badly under silver-standard evaluation as it is unable to capture the target language lexical and constructional preferences as are needed to generate syntactically-unmarked, natural-sounding translations. Unsurprisingly, the increment between gold-standard accuracy and silver-standard accuracy is greater for English than Japanese.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Accuracy over each frequency band
</SectionTitle>
      <Paragraph position="0"> We next analyse the breakdown in gold- and silver-standard accuracies across the three frequency bands. In doing this, we test the hypothesis that training over only translation data from the same frequency band will produce better results than  training over all the translation data. The results for the JE and EJ translation tasks are presented in Tables 7 and 8, respectively. The results based on training over data from all frequency bands are labelled All and those based on training over data from only the same frequency band are labelled Local; G is the gold-standard accuracy and S is the silver-standard accuracy.</Paragraph>
      <Paragraph position="1"> For each of the methods tested, we find that the gold- and silver-standard accuracies drop as we go down through the frequency bands, although the drop off is markedly greater for gold-standard accuracy. Indeed, silver-standard accuracy is constant between the high and medium bands for the JE task, and the medium and low frequency bands for the EJ task. SVMa35 a11a37a36a22a38 appears to be robust over low-frequency data for both tasks, with the absolute difference in silver-standard accuracy between the high and low frequency bands around only 0.10, and never dropping below 0.70 for either the EJ or JE task. There was very little difference between training over data from all frequency bands as compared to only the local frequency band, suggesting that there is little to be gained from conditioning training data on the relative frequency of the NN compound we are seeking to translate.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Accuracy over non-translation-
</SectionTitle>
      <Paragraph position="0"> compositional data Finally, we evaluate the performance of the methods over the non-translation compositional data. We are unable to give gold-standard accuracies here as, by definition, the gold-standard translation is not amongst the translation candidates generated for any of the inputs. We are, however, able to evaluate according to silver-standard accuracy, constructing L1-recoverable translation data as for the translation-compositional case described above.</Paragraph>
      <Paragraph position="1"> The classifier is learned from all the translation-compositional data, treating the gold-standard translations as positive exemplars as before.</Paragraph>
      <Paragraph position="2"> The results are presented in Table 9. A large disparity is observable here between the JE and EJ accuracies, which is, once again, a direct result of Japanese being less forgiving when it comes to L1-recoverable translations. For the translation-compositional data, the EJ task displayed a similarly diminished accuracy increment when the L1-recoverable translation data was incorporated, but this was masked by the higher gold-standard accuracy for the task. The relative results for the JE task largely mirror those for the translationcompositonal data. In contrast, SVMa35 a11 a36 a38 actually performs marginally worse than CTQ over the EJ task, despite SVMa35 a11 performing above CTQ. That is, the addition of dictionary data diminishes overall accuracy, a slightly surprising result given the complementary of corpus and dictionary data in all other aspects of evaluation. It is possible that we could get better results by treating both L1-recoverable and gold-standard translations in the training data as positive exemplars, which we leave as an item for future research.</Paragraph>
      <Paragraph position="3"> Combining the results from Table 9 with those from Table 6, the overall silver-standard accuracy over the JE data is 0.671 for SVMa35 a11a37a36a22a38 (compared to 0.602 for CTQ), and that over the EJ data is 0.461 (compared to 0.419 for CTQ).</Paragraph>
      <Paragraph position="4"> In summary, we have shown our method to be superior to both the baseline and CTQ over EJ and JE translation tasks in terms of both gold- and silver-standard accuracy. We also demonstrated that the method successfully combines crosslingual and target language corpus data, and is relatively robust over low frequency inputs.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Related work
</SectionTitle>
    <Paragraph position="0"> One piece of research relatively closely related to our method is that of Cao and Li (2002), who use bilingual bootstrapping over Chinese and English web data in various forms to translate Chinese NN compounds into English. While we rely on bilingual dictionaries to determine crosslingual similarity, their method is based on contextual similarity in the two languages, without assuming parallelism or comparability in the corpus data. They report an impressive F-score of 0.73 over a dataset of 1000 instances, although they also cite a prior-based F-score (equivalent to our Baseline) of 0.70 for the task, such that the particular data set they are dealing with would appear to be less complex than that which we have targeted. Having said this, contextual similarity is an orthogonal data source to those used in this research, and has the potential to further improve the accuracy of our method.</Paragraph>
    <Paragraph position="1"> Nagata et al. (2001) use &amp;quot;partially bilingual&amp;quot; web pages, that is web pages which are predominantly Japanese, say, but interspersed with English words, to extract translation pairs. They do this by accessing web pages containing a given Japanese expression, and looking for the English expression which occurs most reliably in its immediate vicinity. The method achieves an impressive gold-standard accuracy of 0.62, at a recall of 0.68, over a combination of simplex nouns and compound nominals.</Paragraph>
    <Paragraph position="2"> Grefenstette (1999) uses web data to select English translations for compositional German and Spanish noun compounds, and achieves an impressive accuracy of 0.86-0.87. The translation task Grefenstette targets is intrinsically simpler than that described in this paper, however, in that he considers only those compounds which translate into NN compounds in English. It is also possible that the historical relatedness of languages has an effect on the difficulty of the translation task, although further research would be required to confirm this prediction. Having said this, the successful use of web data by a variety of researchers suggests an avenue for future research in comparing our results with those obtained using web data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML