File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2135_metho.xml
Size: 16,751 bytes
Last Modified: 2025-10-06 14:07:14
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2135"> <Title>Acquisition of Phrase-level Bilingual Correspondence using Dependency Structure</Title> <Section position="3" start_page="0" end_page="933" type="metho"> <SectionTitle> 2 Overview of Our Approach </SectionTitle> <Paragraph position="0"> Our approach presupposes a sentence-aligned parallel corpora. The task is divided into two steps: a monolingual step in which candidate patterns are generated by use of dependency relations, and a bilingual step in which these candidate patterns fi'om each language are paired with their translations. Figure1 shows the flow of our method.</Paragraph> <Paragraph position="1"> Our primary aim is to investigate the effectiveness of dependency structures in the mono-lingual candidate generation step. For this reason, the bilingual step borrows the weighted Dice coefficient and greedy determination from (Kitanmra and Matsumoto, 1996).</Paragraph> <Paragraph position="2"> In the following sections, we explain each step in detail.</Paragraph> </Section> <Section position="4" start_page="933" end_page="934" type="metho"> <SectionTitle> 3 Dependency-Preserving Candidate </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="933" end_page="934" type="sub_section"> <SectionTitle> Patterns </SectionTitle> <Paragraph position="0"> Dependency grammar or related paradigm (Hudson, 1984) focuses on individual words and their relationships. In this framework, every phrase is regarded as consisting of a governor and dependants, where dependants may be optionally classified further. The syntactically dominating word is selected as the governor, with modifiers and complements acting as dependants. Dependency structures are suitably depicted as a directed acyclic graph(DAG), where arrows direct from dependants to governors. null We use a maximum likelihood model proposed in (Fujio and Matsumoto, 1998) where the dependency probability between segments are determined based on its co-occurrence and distance. It has constraints that (a) dependencies do not cross, (b) ee;ch segment has at least one governor I . Furthermore, the model has an 1except for the 'root' segment. For Japanese, the 'root' segment is the rightmost segment. For English, option to allow multiple dependencies whose probabilities are above certain confidence. It is useflfi for cases where phrasal dependencies cannot be determined correctly using only syntactic information. It has an effect of improving recall by sacrificing precision and may contain more partially correct results useful for our candidate pattern generation.</Paragraph> <Paragraph position="1"> We apply the following notions as units of segments: For English, (a) a preposition or conjunction is grouped into the succeeding baseNPs 2, (b) auxiliary verbs are grouped into the succeeding main verb. For Japanese, one (or a sequence of) content word(s) optionally followed by function words 3.</Paragraph> <Paragraph position="2"> Having chunked into suitable segments, sentcnccs are parsed to obtain dependency relations. We have setup the following three models: null 1. best-one model : uses only the most likely (statistically best) dependency relatio~m. At most one dependency is allowed for each segment.</Paragraph> <Paragraph position="3"> 2. ambiguous model : uses dependency relations above the certain confidence score 0.54 . Multiple dependencies may be considered for each segment.</Paragraph> <Paragraph position="4"> 3. adjacent model : uses only adjacency relations between segments. A segment is adjacent to the previous segment.</Paragraph> <Paragraph position="5"> In tile ambiguous model, we expect that nlore likely dependency relations will appear frequently given in a large corpus, thereby increasing the correlation score. Hence, ambiguity at parsing phase will hopefully resolved in the following bilingual pairing phase. As for the adjacent model, only chunking and its adjacency are used.</Paragraph> <Paragraph position="6"> Finally, dependency relations between segments is used to generate candidate patterns. the segment that contains tim mahl verb is regarded as the 'root' segment.</Paragraph> <Paragraph position="7"> \[1\] size i) size 2) size 3) \[saw\] \[agirl\] \[inl lhe park\] {1, saw, gM, park} {l_saw, girl_saw, in-park_saw} {lgirlsaw(T), l_in-parksaw(T)} \[I\] \[saw\] \[agMl \[inl the park\] size i) size 2) size 3) {I, saw, girl, park} { saw_l, girlsaw, in-park_girl } {girl saw_l(L), in-parkgirlsaw(L) } Ill size l) size 2) size 3) Figure 2: best-one model \[saw\] \[a gM\] \[inl theparkl {I, saw, gM, park } { I__saw, gM saw, inq~ark saw, in-park_girl } { I_ gil'l saw(T), l j n-parksaw(T), in-park, girl saw(L) } In this paper, dependency size of a candidate pattern designates the nulnber of segments connetted through dependency relations. Figures 2, 3, and 4 illustrate examples of English can(li(late patterns of dependency size 71, 2 and 3 for the proposed dependency models.</Paragraph> <Paragraph position="8"> In a del)endency-connected candidate pattern, function words of the governor segment is dropped. This is to cope with data sparseness in generated candidate patterns. Moreover, two types of DACs can be generated from patterns of size 3, and we use DAO-type tags ('I2 and 'T') to distinguish their types. W(' also note that candidate patterns do not necessarily for low the word ordering of original sentences. The algorithn~ is as follows: Input: a corpus, the inininmm occurrence threshold in a corpus fmin and the dependency size dw.</Paragraph> <Paragraph position="9"> For each sentence ill a corpus, process tlm following: null 1. Part-of-Speech Tagging 2. Chunking: Rules are written as regular expressions defined over POS word sequences. 3. Dependency Analysis 4. Candidate Pattern Generation: Candidate patterns are generated and stored with their sentence ID. Dependency-connected patterns of less than or equal to the size dw are extracted.</Paragraph> <Paragraph position="10"> Figure 4: adjacent model Output: a hash-table that maps from candi(late patterns appearing at least the minimum occurrence f'min to their sentence IDs found in the corpus.</Paragraph> </Section> </Section> <Section position="5" start_page="934" end_page="938" type="metho"> <SectionTitle> 4 Phrase-level Correspondence Acquisition </SectionTitle> <Paragraph position="0"> Pairing of candidate patterns is a confl)inatorial problem and we take tile following tactics to reduce the seard~ space. First, our algorithm works in a greedy manlmr. This nmans that a translation pair deternfined in the early stage of the algorithm will imver be consktered again.</Paragraph> <Paragraph position="1"> Secondly, filtering process is incorporated.</Paragraph> <Paragraph position="2"> Figure 5 illustrates filtering for a sentence pair &quot;l saw a girl in the park/*\]~ ~:~ ./L ~ , DJ cO (J/&quot; ~- ~ }~ \]-&quot;. A set of candidate patterns derived fi'oln English is depicted on tile left, while thai; from Japanese is depicted on the right. Once a pair &quot;I_girl_saw(T)/&..~'~ ~ _~ k (T)&quot; is determied as a translation pair, then the algorithm assumes that &quot;gI~_ ~'~ ~ }~\]:-- (T)&quot; will not be paired with candidate patterns related to &quot;Lgirl~qaw(T)&quot; (cancelled by diagonal lines in Figure 5) for tile sentence pair. The operation effectively discards the found pairs and causes recalculation of correlation scores in the proceeding iterations.</Paragraph> <Paragraph position="3"> As mentioned in Section 2, our correlation score is calculated by the weighted Dice Coefficient defined as:</Paragraph> <Paragraph position="5"> where .\[j and .re are the number of occurrences in Japanese and English corpora respectively and fej is the number of co-occurrences.</Paragraph> <Paragraph position="6"> '.\['.he algorithm is as follows: Input: hash-tables of candidate patterns for each language, the initial threshold of frequency .fc~,rr and the final threshold of fi'equency fmin.</Paragraph> <Paragraph position="8"> Repeat the following until fcurr reaches fmin.</Paragraph> <Paragraph position="9"> 1. For each pair of English candidate pe and Japanese candidate pj appearing at least f~,.r times, identify the most likely correspondences according to the correlation scores.</Paragraph> <Paragraph position="10"> * For an English pattern Pc, obtain the correspondence candidate set pa = { Pjl, Pj2, ..., Pin } su& that sim(pe,pjk) > log2 fmir~ for all k. Similarly, obtain the correspondence candidate set PE for an Japanese pattern pj * Register (Pe,Pj) as a translation pair if pj = arglnax Pjk E PJ sim( Pc, Pjk ) and Pc = argmax Pek C PE sin1( pj, p& ). The correlation score of (Pe,Pj) is the highest among PJ for Pe and PE for pj.</Paragraph> <Paragraph position="11"> 2. Filter out the co-occurrence positions for Pc, Pj, and related candidate patterns. 3. Lower the threshold of frequency if no more pairs are found with fcurr.</Paragraph> <Paragraph position="12"> 5 Experiment and Result</Paragraph> <Section position="1" start_page="935" end_page="935" type="sub_section"> <SectionTitle> 5.1 Experimental Setting </SectionTitle> <Paragraph position="0"> We use a business expression corpus (Takubo and Hashimoto, 1995) containing 10000 sentences pairs which are pre-aligned.</Paragraph> <Paragraph position="1"> NLP tools are summarised in Table 1.</Paragraph> <Paragraph position="2"> Parameter setting are as follows: dependency size d~ is set to 3. Initially, fc~,.~ and fmin are set to 100 and 2 respectively. As tile algorithm proceeds, f,u~ is adjusted to half of its previous value if it is greater than 10. Otherwise f,~r,&quot; is decremented by i. If the number of registered translation pairs is less than 10, then fcurr is lowered in the next iteration. All parameters are empirically chosen.</Paragraph> </Section> <Section position="2" start_page="935" end_page="936" type="sub_section"> <SectionTitle> 5.2 Result </SectionTitle> <Paragraph position="0"> Our approach is evaluated by the metrics defined below:</Paragraph> <Paragraph position="2"> Precision measures the correctness of extracted translation pairs, while coverage measures tile proportion of correct translation pairs in the parallel corpora. Let X be a pattern.</Paragraph> <Paragraph position="3"> count(X) gives tile mmlber of X returned, occur(X) gives the mlmber of occurrences of X in each corpus, length(X) gives the dependency size of X and cofrcq(X) gives the number of co-occurrences in the parallel corpora.. Px nmans extracted patterns, and of which correct patterns are designated as pt- p~ means the candidate patterns generated from each side of parallel corpora. Coverage is calculated for English and Japanese separately and then thier nman is taken.</Paragraph> <Paragraph position="4"> Precision for each model is summarised in Tables 2, 3, and 4, while coverage is shown in Table 5. To examine the characteristics of each model, we expand correspondence candidate sets PE and Pa so that patterns '5 with tile correlation score > log2 2 (> 1) are also considered. These are marked by asterisks &quot;*&quot; in Tables. Random samples of correct and near-correct translation pairs are shown in Table 6, Table 7 respectively. Extracted translation pairs are matched against the original corpora to restore their word ordering. This restoration is done nmnually this time, but can be automated with little modification in our algorithm.</Paragraph> </Section> <Section position="3" start_page="936" end_page="938" type="sub_section"> <SectionTitle> 5.3 Discussion </SectionTitle> <Paragraph position="0"> As we see from Tal)le 2 and 3, the t)est one model adfieves 1)etter precision than the adjacent model. Upon inspecting the results, nearly the same translation patterns are extracted for higher thresholds. This is because our dependency parsers use the distance feature in determining dependency. Consequently, nearer segments are likely to 1)e dependency-related. Experiment data shows that tile exact overlaps are found in 9348 out of 14705 (63.55%) candi(late patterns for English and 6625 out of 11566 (57.27%) for Japanese.</Paragraph> <Paragraph position="1"> However, the difference appears when the threshold reaches 3 and patterns such as &quot; not hesitate to contact/~)~, ~ < ~*~,~&quot; which is not found in the adjacent model are extracted.</Paragraph> <Paragraph position="2"> Moreover, the l)est-~one model is l)ettm&quot; in terms of coverage. These results support that the dependency relations appear useful clues than just being linearly ordered.</Paragraph> <Paragraph position="3"> Comparing the 1)est one model with the amt)iguous model, the aml)iguous model achieves a higher precision except for *2. This indicates to become correct patterns are embraced by &quot;0&quot;. Segments to be added are embraced by &quot;~&quot; that the accuracy of dependency parsers currently achieves are insufficient, and therefore, better to expand the possibilities of candidate patterns by allowing redundant dependency relations. As the dependency parsers improve, tlm best~one model will outperform the ambiguous model. However, as the result of *2 shows, candidates from redundant dependency relations are mostly exl;racted at the low threshohl. The overall trend reveals that redundant relations act as noise at low thresholds, but help to scale up the the correlation score at higher thresholds. null As shown in Table 6, a domain-specific disambiguation sample (&quot;Thank youFb U ;b~ ~ 9 &quot; vs. &quot;Thank you in advance/~:b -z &quot;E ~3N~ b 3= W * 9-&quot;) is found. As for long-distance dependency-related translation patterns, &quot;~i&quot;case (nominative) and verb patterns (consultat;ions include/~,~ t:-- I,:1;. ~ ~ ~ ) are extracted 6. 6A typical Japanese sentence follows S-O-V s~ructure: Other types of long-distance translation patterns such as &quot;~d &quot;-case (accusative) and verb patterns (be held at X/X -d ~g@.9- ;5 ) are not extracted even candidate patterns fi'om each corpus are generated.</Paragraph> <Paragraph position="4"> Generally speaking, acquiring long-distance translation patterns is a hard problem. We still require fllrther investigation examining under what circumstance the dependency relations are really effective. So far, we use relatively &quot;clean&quot; business expression corpora which is a collection of standard usage. However, in the real world setting, more repetitions and variations will be observed. Adjuncts can be placed in less constrained way and the adjacent model cannot deal with if they are apart. In such cases, awdlablilty of robust dependency parsers become essential, dependency relations plays a key role in finding the long-distance translation patterns.</Paragraph> <Paragraph position="5"> while tile English counterpart follows S-V-O s~rucfiure.</Paragraph> </Section> </Section> <Section position="6" start_page="938" end_page="938" type="metho"> <SectionTitle> 6 Related Works </SectionTitle> <Paragraph position="0"> Smadja et a1.(1996) finds rigid and flexible collocations. They first identify candidate colloeLtions in English~ and subsequently, find the corresponding lq'ench collocations by gradually expanding the candidate word sequences. Kitamura et a1.(1996) enmnerates word sequences of arbitrary length (n-grmn of content words) that appear more than the mininmln threshold from English and Japanese and attempts to tlnd the correspomlence based on the prepared candidate lists.</Paragraph> <Paragraph position="1"> Difference from Smadja et a1.(t996) is that our method is hi-directional and difference from KiLamura et al. (1996) is that we use dependency relations whidl leads to &quot;structured&quot; phrasal correspondence as opposed to &quot;flat&quot; adjacent correspondence.</Paragraph> <Paragraph position="2"> On the other hand, Matsumoto et a1.(1993), KiLamura et a1.(1995) and Meyers et a1.(1996) use. dependency structure for structm'al matching, of sentences to acquire translation rules. Their methods empl W grammar-based parsers and only work for declarative sentences. Their objectives are complete inatching of dependency trees of two languages.</Paragraph> <Paragraph position="3"> :\[nstea(t, our method uses statistical dependency parsers and are not restricted to simple sentences for input. Fnrthermore, we are concerned with partial matdfing of dependency trees so that the overall robustness and coverage will be improved.</Paragraph> </Section> class="xml-element"></Paper>