File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1803_metho.xml
Size: 21,896 bytes
Last Modified: 2025-10-06 14:08:34
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1803"> <Title>Noun-Noun Compound Machine Translation: A Feasibility Study on Shallow Processing</Title> <Section position="3" start_page="0" end_page="4" type="metho"> <SectionTitle> 2 Methods for translating NN compounds </SectionTitle> <Paragraph position="0"> Two basic paradigms exist for translating NN compounds: memory-based machine translation and dynamic machine translation. Below, we discuss these two paradigms in turn and representative instantiations of each.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Memory-based machine translation </SectionTitle> <Paragraph position="0"> Memory-based machine translation (MBMT) is a simple and commonly-used method for translating NN compounds, whereby translation pairs are stored in a static translation database indexed by their source language strings. MBMT has the ability to produce consistent, high-quality translations (conditioned on the quality of the original bilingual dictionary) and is therefore suited to translating compounds in closed domains. Its most obvious drawback is that the method can translate only those source language strings contained in the translation database.</Paragraph> <Paragraph position="1"> There are a number of ways to populate the translation database used in MBMT, the easiest of which is to take translation pairs directly from a bilingual dictionary (dictionary-driven MBMT or MBMTDICT).</Paragraph> <Paragraph position="2"> MBMTDICT offers an extremist solution to the idiomaticity problem, in treating all NN compounds as being fully lexicalised. Overgeneration is not an issue, as all translations are manually determined.</Paragraph> <Paragraph position="3"> As an alternative to a precompiled bilingual dictionary, translation pairs can be extracted from a parallel corpus (Fung, 1995; Smadja et al., 1996; Ohmori and Higashida, 1999), that is a bilingual document set that is translation-equivalent at the sentence or paragraph level; we term this MT configuration alignment-driven MBMT (or MBMTALIGN). While this method alleviates the problem of limited scalability, it relies on the existence of a parallel corpus in the desired domain, which is often an unreasonable requirement.</Paragraph> <Paragraph position="4"> Whereas a parallel corpus assumes translation equivalence, a comparable corpus is simply a crosslingual pairing of corpora from the same domain (Fung and McKeown, 1997; Rapp, 1999; Tanaka and Matsuo, 1999; Tanaka, 2002). It is possible to extract translation pairs from a comparable corpus by way of the following process (Cao and Li, 2002): for each NN compound by accessing translations for each component word and slotting these into translation templates; example JE translation templates for source Japanese string [Na15 Na16 ]J are [Na15 Na16 ]E and [Na16 of Na15 ]E, where the nu- null meric subscripts indicate word coindexation between Japanese and English (resulting in, e.g., machine translation and translation of machine) 3. use empirical evidence from the target language corpus to select the most plausible translation candidate We term this process word-to-word compositional MBMT (or MBMTCOMP). While the coverage of MBMTCOMP is potentially higher than MBMTALIGN due to the greater accessibility of corpus data, it is limited to some degree by the coverage of the simplex translation dictionary used in Step 2 of the translation process. That is, only those NN compounds whose component nouns occur in the bilingual dictionary can be translated.</Paragraph> <Paragraph position="5"> Note that both MBMTALIGN and MBMTCOMP lead to a static translation database. MBMTCOMP is also subject to overgeneration as a result of dynamically generating translation candidates.</Paragraph> </Section> <Section position="2" start_page="0" end_page="4" type="sub_section"> <SectionTitle> 2.2 Dynamic machine translation </SectionTitle> <Paragraph position="0"> Dynamic machine translation (DMT) is geared towards translating arbitrary NN compounds. In this paper, we consider two methods of dynamic translation: word-to-word compositional DMT and interpretation-driven DMT.</Paragraph> <Paragraph position="1"> Word-to-word compositional DMT (or DMTCOMP) differs from MBMTCOMP only in that the source NN compounds are fed directly into the system rather than extracted out of a source language corpus. That is, it applies Steps 2 and 3 of the method for MBMTCOMP to an arbitrary source language string.</Paragraph> <Paragraph position="2"> Interpretation-driven DMT (or DMTINTERP) offers the means to deal with NN compounds where strict word-to-word alignment does not hold. It generally does this in two stages: 1. use semantics and/or pragmatics to carry out deep analysis of the source NN compound, and map it into some intermediate (i.e. interlingual) semantic representation (Copestake and Lascarides, 1997; Barker and Szpakowicz, 1998; Rosario and Hearst, 2001) 2. generate the translation directly from the semantic representation DMTINTERP removes any direct source/target language interdependence, and hence solves the problem of overgeneration due to crosslingual bias. At the same time, it is forced into tackling idiomaticity headon, by way of interpreting each individual NN compound. As for DMTCOMP, DMTINTERP suffers from undergeneration.</Paragraph> <Paragraph position="3"> With DMTINTERP, context must often be called upon in interpreting NN compounds (e.g. apple juice seat (Levi, 1978; Bauer, 1979)), and minimal pairs with sharply-differentiated semantics such as colour/group photograph illustrate the fine-grained distinctions that must be made. It is interesting to note that, while these examples are difficult to interpret, in an MT context, they can all be translated word-to-word compositionally into Japanese. That is, apple juice seat translates most naturally asa0a2a1a4a3a6a5a8a7a10a9a12a11</Paragraph> <Paragraph position="5"> which retains the same scope for interpretation as its English counterpart; similarly, colour photograph translates trivially as a18a20a19a21a11 a8a23a22a21a24 karaaa8shashiN &quot;colour photograph&quot; and group photograph as a25a27a26 a8a28a22a29a24 daNtaia8shashiN &quot;group photograph&quot;. In these cases, therefore, DMTINTERP offers no advantage over DMTCOMP, while incurring a sizeable cost in producing a full semantic interpretation.</Paragraph> </Section> </Section> <Section position="4" start_page="4" end_page="4" type="metho"> <SectionTitle> 3 Methodology </SectionTitle> <Paragraph position="0"> We selected the tasks of Japanese-to-English and English-to-Japanese NN compound MT for evaluation, and tested MBMTDICT and DMTCOMP on each task. Note that we do not evaluate MBMTALIGN as results would have been too heavily conditioned on the makeup of the parallel corpus and the particular alignment method adopted. Below, we describe the data and method used in evaluation.</Paragraph> <Paragraph position="1"> 4Here, no is the genitive marker.</Paragraph> <Section position="1" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 3.1 Testdata </SectionTitle> <Paragraph position="0"> In order to generate English and Japanese NN compound testdata, we first extracted out all NN bigrams from the BNC (90m word tokens, Burnard (2000)) and 1996 Mainichi Shimbun Corpus (32m word tokens, Mainichi Newspaper Co. (1996)), respectively.</Paragraph> <Paragraph position="1"> The BNC had been tagged and chunked using fnTBL (Ngai and Florian, 2001), and lemmatised using morph (Minnen et al., 2001), while the Mainichi Shimbun had been segmented and tagged using ALT-JAWS.5 For both English and Japanese, we took only those NN bigrams adjoined by non-nouns to ensure that they were not part of a larger compound nominal. In the case of English, we additionally measured the entropy of the left and right contexts for each NN type, and filtered out all compounds where either entropy value was a30a32a31 .6 This was done in an attempt to, once again, exclude NNs which were embedded in larger MWEs, such as service department in social service department.</Paragraph> <Paragraph position="2"> We next extracted out the 250 most common NN compounds from the English and Japanese data, and from the remaining data, randomly selected a further 250 NN compounds of frequency 10 or greater (out of 20,748 English and 169,899 Japanese NN compounds). In this way, we generated a total of 500 NN compounds for each of English and Japanese. For the Japanese NN compounds, any errors in segmentation were post-corrected. Note that the top-250 NN compounds accounted for about 7.0% and 3.3% of the total token occurrences of English and Japanese NN compounds, respectively; for the random sample of 250 NN compounds, the relative occurrence of the English and Japanese compounds out of the total token sample was 0.5% and 0.1%, respectively.</Paragraph> <Paragraph position="3"> We next generated a unique gold-standard translation for each of the English and Japanese NN compounds. In order to reduce the manual translation overhead and maintain consistency with the output of MBMTDICT in evaluation, we first tried to translate each English and Japanese NN compound automatically by MBMTDICT. In this, we used the union of two Japanese-English dictionaries: the ALTDIC dictionary and the on-line EDICT dictionary (Breen, 1995). The ALTDIC dictionary was compiled from the ALT-J/E MT system (Ikehara et al., 1991), and has approximately 400,000 entries including more than 200,000 proper nouns; EDICT has approximately 150,000 entries. In the case that multiple translation candidates were found for a given NN compound, the most appropriate of these was selected manually, or in the case that the dictionary translations were considered was the, a or a sentence boundary, the threshold was switched off. Similarly for the right token entropy, if the most-probable right context was a punctuation mark or sentence boundary, the threshold was switched off.</Paragraph> <Paragraph position="4"> to be sub-optimal or inappropriate, the NN compound was put aside for manual translation. Finally, all dictionary-based translations were manually checked for accuracy.</Paragraph> <Paragraph position="5"> The residue of NN compounds for which a translation was not found were translated manually. Note that as we manually check all translations, the accuracy of MBMTDICT is less than 100%. At the same time, we give MBMTDICT full credit in evaluation for containing an optimal translation, by virtue of using the dictionaries as our primary source of translations.</Paragraph> </Section> <Section position="2" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 3.2 Upper bound accuracy-based evaluation </SectionTitle> <Paragraph position="0"> We use the testdata to evaluate MBMTDICT and DMTCOMP. Both methods potentially produce multiple translations candidates for a given input, from which a unique translation output must be selected in some way. So as to establish an upper bound on the feasibility of each method, we focus on the translation candidate generation step in this paper and leave the second step of translation selection as an item for further research.</Paragraph> <Paragraph position="1"> With MBMTDICT, we calculate the upper bound by simply checking for the gold-standard translation within the translation candidates. In the case of DMTCOMP, rather than generating all translation candidates and checking among them, we take a pre-determined set of translation templates and a simplex translation dictionary to test for word alignment. Word alignment is considered to have been achieved if there exists a translation template and set of word translations which lead to an isomorphic mapping onto the gold-standard translation. For a49a51a50 a8a53a52a55a54 ryoudoa8moNdai &quot;territorial dispute&quot;, for example, alignment is achieved through the word-level translations a49a56a50 ryoudo &quot;territory&quot; and a52a57a54 moNdai &quot;dispute&quot;, and the mapping conforms to the</Paragraph> <Paragraph position="3"> possible to translatea49a59a50 a8a60a52a61a54 by way of DMTCOMP.</Paragraph> <Paragraph position="4"> Note here that derivational morphology is used to convert the nominal translation of territory into the adjective territorial.</Paragraph> <Paragraph position="5"> On the first word-alignment pass for DMTCOMP, the translation pairs in each dataset were automatically aligned using only ALTDIC. We then manual inspected the unaligned translation pairs for translation pairs which were not aligned simply because of patchy coverage in ALTDIC. In such cases, we manually supplemented ALTDIC with simplex translation pairs taken from the Genius Japanese-English dictionary (Konishi, 1997),7 resulting in an additional 178 simplex entries. We then performed a second pass of alignment using the supplemented ALTDIC (ALTDICa62 ). Below, we present the results for both the original ALTDIC and ALTDICa62 .</Paragraph> </Section> <Section position="3" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 3.3 Learning translation templates </SectionTitle> <Paragraph position="0"> DMTCOMP relies on translation templates to map the source language NN compound onto different constructions in the target language and generate translation candidates. For the JE task, the question of what templates are used becomes particularly salient due to the syntactic diversity of the gold standard English translations (see below). Rather than assuming a manually-specified template set for the EJ and JE NN compound translation tasks, we learn the templates from NN compound translation data. Given that the EJ and JE testdata is partitioned equally into the top-250 and random-250 NN compounds, we crossvalidate the translation templates. That is, we perform two iterations over each of the JE and EJ datasets, taking one dataset of 250 NN compounds as the test set and the remaining dataset as the training set in each case. We first perform word-alignment on the training dataset, and in the case that both source language nouns align leaving only closed-class function words in the target language, extract out the mapping schema as a translation template (with word coindices). We then use this extracted set of translation templates as a filter in analysing word alignment in the test set.</Paragraph> <Paragraph position="1"> A total of 23 JE and 3 EJ translation templates were learned from the training data in each case, a sample of which are shown in Table 1.8 Here, the count for each template is the combined number of activations over each combined dataset of 500 compounds.</Paragraph> </Section> <Section position="4" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 3.4 Evaluation measures </SectionTitle> <Paragraph position="0"> The principal evaluatory axes we consider in comparing the different methods are coverage and accuracy: coverage is the relative proportion of a given set of NN compounds that the method can generate some translation for, and accuracy describes the proportion of translated NN compounds for which the gold-standard translation is reproduced (irrespective of how many other translations are generated). These two tend to be in direct competition, in that more accurate methods tend to have lower coverage, and conversely higher coverage methods tend to have lower accuracy.</Paragraph> <Paragraph position="1"> So as to make cross-system comparison simple, we additionally combine these two measures into an Fscore, that is their harmonic mean.</Paragraph> </Section> </Section> <Section position="5" start_page="4" end_page="4" type="metho"> <SectionTitle> 4 Results </SectionTitle> <Paragraph position="0"> We first present the individual results for MBMTDICT and DMTCOMP, and then discuss a cascaded system combining the two.</Paragraph> <Section position="1" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 4.1 Dictionary-driven MBMT </SectionTitle> <Paragraph position="0"> The source of NN compound translations for MBMTDICT was the combined ALTDIC and EDICT dictionaries. Recall that this is the same dictionary as was used in the first pass of generation of gold standard translations (see a8 3.1), but that the gold-standard translations were manually selected in the case of multiple dictionary entries, and an alternate translation manually generated in the case that a more appropriate translation was considered to exist.</Paragraph> <Paragraph position="1"> The results for MBMTDICT are given in Table 2, for both translation directions. In each case, we carry out evaluation over the 250 most-commonly occurring NN compounds (TOP 250), the random sample of 250 NN compounds (RAND 250) and the combined 500element dataset (ALL).</Paragraph> <Paragraph position="2"> The accuracies (Acc) are predictably high, although slightly lower for the random-250 than the top-250.</Paragraph> <Paragraph position="3"> The fact that they are below 100% indicates that the translation dictionary is not infallible and contains a number of sub-optimal or misleading translations.</Paragraph> <Paragraph position="4"> One such example is a0a2a1 a8a4a3a6a5 kyuusaia8kikiN &quot;relief fund&quot; for which the dictionary provides the unique, highly-specialised translation lifeboat.</Paragraph> <Paragraph position="5"> Coverage (Cov) is significantly lower than accuracy, but still respectable, particularly for the random-250 datasets. This is a reflection of the inevitable emphasis by lexicographers on more frequent expressions, and underlines the brittleness of MBMTDICT. An additional reason for coverage being generally lower than accuracy is that dictionaries tend not to contain transparently compositional compounds, an observation which applies particularly to ALTDIC as it was developed for use with a full MT system. Coverage is markedly lower for the JE task, largely because ALTJAWS--which uses ALTDIC as its system dictionary--tends to treat the compound nouns in ALTDIC as single words. As we used ALTJAWS to pre-process the corpus we extracted the Japanese NN compounds from, a large component of the compounds in the translation dictionary was excluded from the JE data. One cause of a higher coverage for the EJ task is that many English compounds are translated into single Japanese words (e.g. interest rate vs. a7a9a8 riritsu) and thus reliably recorded in bilingual dictionaries. There are 127 single word translations in the EJ dataset, but only 31 in the JE dataset.</Paragraph> <Paragraph position="6"> In summary, MBMTDICT offers high accuracy but mid-range coverage in translating NN compounds, with coverage dropping off appreciably for less-frequent compounds.</Paragraph> </Section> <Section position="2" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 4.2 Word-to-word composional DMT </SectionTitle> <Paragraph position="0"> In order to establish an upper bound on the performance of DMTCOMP, we word-aligned the source language NN compounds with their translations, using the extracted translation templates as described in a8 3.3. The results of alignment are classified into four mutually-exclusive classes, as detailed below: (A) Completely aligned All component words align according to one of the extracted translation templates.</Paragraph> <Paragraph position="1"> (B) No template The translation does not correspond to a known translation template (irrespective of whether component words align in the source compound). null (C) Partially aligned Some but not all component words align. We subclassify instances of this class into: C1 compounds, where there are unaligned words in both the source and target languages; C2 compounds, where there is an unaligned word in the source language only; and C3 compounds where there are unaligned words in the target language only.</Paragraph> <Paragraph position="2"> (D) No alignment No component words align between the source NN compound and translation. We subclassify D instances into: D1 compounds, where the translation is a single word; and D2 compounds, where no word pair aligns.</Paragraph> <Paragraph position="3"> The results of alignment are shown in Table 3, for each of the top-250, random-250 and combined 500element datasets. The alignment was carried out using both the basic ALTDIC and ALTDICa62 (ALTDIC with 178 manually-added simplex entries). Around 40% of the data align completely using ALTDICa62 in both translation directions. Importantly, DMTCOMP is slightly more robust over the random-250 dataset and partially aligned instances. This contrasts with MBMTDICT which was found to be brittle over the less-frequent random-250 dataset.</Paragraph> </Section> <Section position="3" start_page="4" end_page="4" type="sub_section"> <SectionTitle> 4.3 Combination of MBMTDICT and DMTCOMP </SectionTitle> <Paragraph position="0"> We have demonstrated MBMTDICT to have high accuracy but relatively low coverage (particularly over lower-frequency NN compounds), and DMTCOMP to have medium accuracy but high coverage. To combine the relative strengths of the two methods, we test a cascaded architecture, whereby we first attempt to translate each NN compound using MBMTDICT, and failing this, resort to DMTCOMP.</Paragraph> <Paragraph position="1"> Table 4 shows the results for MBMTDICT and DMTCOMP in isolation, and when cascaded (Cascade). For both translation directions, cascading results in a sharp increase in F-score, with coverage constantly above 95% and accuracy dropping only marginally to just under 90% for the EJ task. The cascaded method represents the best-achieved shallow translation upper bound achieved in this research.</Paragraph> </Section> </Section> class="xml-element"></Paper>