XML Viewer - p99-1043

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1043_intro.xml
Size: 11,524 bytes
Last Modified: 2025-10-06 14:06:55
<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1043">
  <Title>Mixed Language Query Disambiguation</Title>
  <Section position="3" start_page="333" end_page="336" type="intro">
    <SectionTitle>
2 Methodology
</SectionTitle>
    <Paragraph position="0"> Mixed language query translation is halfway between query translation and query disambiguation in that not all words in the query need to be translated.</Paragraph>
    <Paragraph position="1"> There are two ways to use the disambiguated mixed language queries. In one scenario, all secondary language words are translated unambiguously into the primary language, and the resulting monolingual query is processed by a general IR system. In another scenario, the primary language words are converted into secondary language and the query is passed to another IR system in the secondary language.</Paragraph>
    <Paragraph position="2"> Our methods allows for both general and cross-language IR from a mixed language query.</Paragraph>
    <Paragraph position="3"> To draw a parallel to the three problems of query translation, we suggest that the three main problems of mixed language disambiguation are:  1. generating translation candidates in the primary language, 2. weighting translation candidates, and 3. pruning translation alternatives for query translation.</Paragraph>
    <Paragraph position="4">  Co-occurrence information between neighboring words and words in the same sentence has been used in phrase extraction (Smadja, 1993; Fung and Wu, 1994), phrasal translation (Smadja et al., 1996; Kupiec, 1993; Wu, 1995; Dagan and Church, 1994), target word selection (Liu and Li, 1997; Tanaka and Iwasaki, 1996), domain word translation (Fung and Lo, 1998; Fung, 1998), sense disambiguation (Brown et al., 1991; Dagan et al., 1991; Dagan and Itai, 1994; Gale et al., 1992a; Gale et al., 1992b; Gale et al., 1992c; Shiitze, 1992; Gale et al., 1993; Yarowsky, 1995), and even recently for query translation in cross-language IR as well (Ballesteros and Croft, 1998). Co-occurrence statistics is collected from either bilingual parallel and  non-parallel corpora (Smadja et al., 1996; Kupiec, 1993; Wu, 1995; Tanaka and Iwasaki, 1996; Fung and Lo, 1998), or monolingual corpora (Smadja, 1993; Fung and Wu, 1994; Liu and Li, 1997; Shiitze, 1992; Yarowsky, 1995). As we noted in (Fung and Lo, 1998; Fung, 1998), parallel corpora are rare in most domains. We want to devise a method that uses only mono-lingual data in the primary language to train co-occurrence information.</Paragraph>
    <Section position="1" start_page="334" end_page="334" type="sub_section">
      <SectionTitle>
2.1 Translation candidate generation
</SectionTitle>
      <Paragraph position="0"> Without loss of generality, we suppose the mixed language sentence consists of the words S = (E1,E2,...,C,...,En}, where C is the only secondary language word 1. Since in our method we want to find the co-occurrence information between all Ei and C from a mono-lingual corpus, we need to translate the latter into the primary language word Ec. This corresponds to the first problem in query translation--translation candidate generation.</Paragraph>
      <Paragraph position="1"> We generate translation candidates of C via an online bilingual dictionary. All translations of secondary language word C, comprising of multiple senses, are taken together as a set {Eci }.</Paragraph>
    </Section>
    <Section position="2" start_page="334" end_page="334" type="sub_section">
      <SectionTitle>
2.2 Translation candidate weighting
</SectionTitle>
      <Paragraph position="0"> Problem two in query translation is to weight all translation candidates for C. In our method, the weights are based on co-occurrence information. The hypothesis is that the correct translations of C should co-occur frequently with the contextual words Ei and incorrect translation of C should co-occur rarely with the contextual words. Obviously, other information such as syntactical relationship between words or the part-of-speech tags could be used as weights too.</Paragraph>
      <Paragraph position="1"> However, it is difficult to parse and tag a mixed language sentence. The only information we can use to disambiguate C is the co-occurrence information between its translation candidates { Ec, } and El, E2, . . . , En.</Paragraph>
      <Paragraph position="2"> Mutual information is a good measure of the co-occurrence relationship between two words (Gale and Church, 1993). We first compute the mutual information between any word pair from a monolingual corpus in the primary language 2  as the testing data using the following formula, where E is a word and f (E) is the frequency of word E.</Paragraph>
      <Paragraph position="3"> MI(Ei, Ej) = log f(Ei, Ej) f(Ei) * f(Sj) (1) Ei and Ej can be either neighboring words or any two words in the sentence.</Paragraph>
    </Section>
    <Section position="3" start_page="334" end_page="336" type="sub_section">
      <SectionTitle>
2.3 Translation candidate pruning
</SectionTitle>
      <Paragraph position="0"> The last problem in query translation is selecting the target translation. In our approach, we need to choose a particular Ec from Ec~. We call this pruning process translation disambiguation. null We present and compare three unsupervised statistical methods in this paper. The first base-line method is similar to (Dagan et al., 1991; Dagan and Itai, 1994; Ballesteros and Croft, 1998; Smadja et al., 1996), where we use the nearest neighboring word of the secondary language word C as feature for disambiguation.</Paragraph>
      <Paragraph position="1"> In the second method, we chQose all contextual words as disambiguating feature. In the third method, the most discriminative contextual word is selected as feature.</Paragraph>
      <Paragraph position="2"> 2.3.1 Baseline: single neighboring word as disambiguating feature The first disambiguating feature we present here is similar to the statistical feature in (Dagan et al., 1991; Smadja et al., 1996; Dagan and Itai, 1994; Ballesteros and Croft, 1998), namely the co-occurrence with neighboring words. We do not use any syntactic relationship as in (Dagan and Itai, 1994) because such relationship is not available for mixed-language sentences. The assumption here is that the most powerful word for disambiguating a word is the one next to it.</Paragraph>
      <Paragraph position="3"> Based on mutual information, the primary language target word for C is chosen from the set {Ec~}. Suppose the nearest neighboring word for C in S is Ey, we select the target word Ecr, such that the mutual information between Ec~ and Ev is maximum.</Paragraph>
      <Paragraph position="5"> Ev is taken to be either the left or the right neighbor of our target word.</Paragraph>
      <Paragraph position="6"> This idea is illustrated in Figure 1. MI1, represented by the solid line, is greater than MI2,  represented by the dotted line. Ey is the neighboring word for C. Since MI1 is greater than MI2, Ecl is selected as the translation of C.  words as disambiguating feature The baseline method uses only the neighboring word to disambiguate C. Is one or two neighboring word really sufficient for disambiguation? null The intuition for choosing the nearest neighboring word Ey as the disambiguating feature for C is based on the assumption that they are part of a phrase or collocation term, and that there is only one sense per collocation (Dagan and Itai, 1994; Yarowsky, 1993). However, in most cases where C is a single word, there might be some other words which are more useful for disambiguating C. In fact, such long-distance dependency occurs frequently in natural language (Rosenfeld, 1995; Huang et al., 1993). Another reason against using single neighboring word comes from (Gale and Church, 1994) where it is argued that as many as 100,000 context words might be needed to have high disambiguation accuracy. (Shfitze, 1992; Yarowsky, 1995) all use multiple context words as discriminating features. We have also demonstrated in our domain translation task that multiple context words are useful (Fung and Lo, 1998; Fung and McKeown, 1997).</Paragraph>
      <Paragraph position="7"> Based on the above arguments, we enlarge the disambiguation window to be the entire sentence instead of only one word to the left or right. We use all the contextual words in the query sentence. Each contextual word &amp;quot;votes&amp;quot; by its mutual information with all translation candidates.</Paragraph>
      <Paragraph position="8"> Suppose there are n primary language words in S = E1,E2,...,C,...,En, as shown in Figure 2, we compute mutual information scores between all Ec~ and all Ej where Eci is one of the translation candidates for C and Ej is one of all n words in S. A mutual information score matrix is shown in Table 1. whereMIjc~ is the mutual information score between contextual word Ej and translation candidate Eel.  For each row j in Table 1, the largest scoring MIjci receives a vote. The rest of the row get zero's. At the end, we sum up all the one's in each column. The column i receiving the highest vote is chosen as the one representing the real translation.</Paragraph>
      <Paragraph position="9">  To illustrate this idea, Table 2 shows that candidate 2 is the correct translation for C. There are four candidates of C and four contextual words to disambiguate C.</Paragraph>
      <Paragraph position="11"> disambiguating feature In the above voting scheme, a candidate receives either a one vote or a zero vote from all contex- null tual words equally no matter how these words axe related to C. As an example, in the query &amp;quot;Please show me the latest dianying/movie of Jacky Chan&amp;quot;, the and Jacky are considered to be equally important. We believe however, that if the most powerful word is chosen for disambiguation, we can expect better performance. This is related to the concept of &amp;quot;trigger pairs&amp;quot; in (Rosenfeld, 1995) and Singular Value Decomposition in (Shfitze, 1992).</Paragraph>
      <Paragraph position="12"> In (Dagan and Itai, 1994), syntactic relationship is used to find the most powerful &amp;quot;trigger word&amp;quot;. Since syntactic relationship is unavailable in a mixed language sentence, we have to use other type of information. In this method, we want to choose the best trigger word among all contextual words. Referring again to Table 1, Mljci is the mutual information score between contextual word Ej and translation candidate Ec~.</Paragraph>
      <Paragraph position="13"> We compute the disambiguation contribution ratio for each context word Ej. For each row j in Table 1, the largest MI score Mljc~ and the second largest MI score Mljc~ are chosen to yield the contribution for word Ej, which is the ratio between the two scores</Paragraph>
      <Paragraph position="15"> If the ratio between MIjc/and MIjc~ is close to one, we reason that Ej is not discriminative enough as a feature for disambiguating C. On the other hand, if the ratio between MIie/i and MIie.~ is noticeably greater than one, we can use Ej as the feature to disambiguate {Ec~} with high confidence. We choose the word Ey with maximum contribution as the disambiguating feature, and select the target word Ecr , whose mutual information score with Ey is the highest, as the translation for C.</Paragraph>
      <Paragraph position="17"> This method is illustrated in Figure 3. Since E2 is the contextual word with highest contribution score, the candidate Ei is chosen that the mutual information between E2 and Eci is the largest.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML