XML Viewer - p00-1054

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/p00-1054_metho.xml
Size: 18,391 bytes
Last Modified: 2025-10-06 14:07:20
<?xml version="1.0" standalone="yes"?>
<Paper uid="P00-1054">
  <Title>Lexical transfer using a vector-space model</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 Our proposal
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.1 Our problem and approach
</SectionTitle>
      <Paragraph position="0"> In this paper, we concentrate on lexical transfer, i.e., target word selection. In other words, the mapping of structures between source and target expressions is not dealt with here. We assume that this structural transfer can be solved on top of lexical transfer.</Paragraph>
      <Paragraph position="1"> We propose an approach that differs from the studies mentioned in the introduction section in that: I) It use not structural representations like case frames but vector-space representations.</Paragraph>
      <Paragraph position="2"> II) The weight of each element for constraining the ambiguity of target words is determined automatically by following the term frequency and inverse document frequency in information retrieval research.</Paragraph>
      <Paragraph position="3"> III) A word alignment that does not rely on parsing is utilized.</Paragraph>
      <Paragraph position="4"> IV) Bilingual corpora are clustered in terms of target equivalence.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.2 Background
</SectionTitle>
      <Paragraph position="0"> The background for the decisions made in our approach is as follows: A) We would like to reduce human interaction to prepare the data necessary for building lexical transfer rules.</Paragraph>
      <Paragraph position="1"> B) We do not expect that mature parsing systems for multi-languages and/or spoken languages will be available in the near future.</Paragraph>
      <Paragraph position="2"> C) We would like the determination of the importance of each feature in the target selection to be automated.</Paragraph>
      <Paragraph position="3"> D) We would like the problem caused by errors in the corpora and data sparseness to be reduced.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Vector-space model
</SectionTitle>
    <Paragraph position="0"> This section explains our trial for applying a vector-space model to lexical transfer starting from a basic idea.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Basic idea
</SectionTitle>
      <Paragraph position="0"> We can select an appropriate target word for a given source word by observing the environment including the context, world knowledge, and target words in the neighborhood. The most influential elements in the environment are of course the other words in the source sentence surrounding the concerned source word.</Paragraph>
      <Paragraph position="1"> Suppose that we have translation examples including the concerned source word and we know in advance which target word corresponds to the source word.</Paragraph>
      <Paragraph position="2"> By measuring the similarity between (1) an unknown sentence that includes the concerned source word and (2) known sentences that include the concerned source word, we can select the target word which is included in the most similar sentence.</Paragraph>
      <Paragraph position="3"> This is the same idea as example-based machine translation (Sato and Nagao, 1990 and Furuse et. al., 1994).</Paragraph>
      <Paragraph position="4"> Group1: (not sweet) source sentence 1: This beer is drier and full-bodied. target sentence 1: source sentence 2: Would you like dry or sweet sherry? target sentence 2: source sentence 3: A dry red wine would go well with it. target sentence 3: Group2: (not wet) source sentence 4: Your skin feels so dry. target sentence 4: source sentence 5: You might want to use some cream to protect your skin against the dry air. target sentence 5: Table 1 Portions of English &amp;quot;dry&amp;quot; into Japanese for an aligned corpus Listed in Table 1 are samples of English-Japanese sentence pairs of our corpus including the source word &amp;quot;dry.&amp;quot; The upper three samples of group 1 are translated with the target word &amp;quot; (not sweet)&amp;quot; and the lower two samples of group 2 are translated with the target word &amp;quot; (not wet).&amp;quot; The remaining portions of target sentences are hidden here because they do not relate to the discussion in the paper. The underlined words are some of the cues used to select the target words. They are distributed in the source sentence with several different grammatical relations such as subject, parallel adjective, modified noun, and so on, for the concerned word &amp;quot;dry.&amp;quot;</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Sentence vector
</SectionTitle>
      <Paragraph position="0"> We propose representing the sentence as a sentence vector, i.e., a vector that lists all of the words in the sentence. The sentence vector of the first sentence of Table 1 is as follows: &lt;this, beer, is, dry, and, full-body&gt;  that we have the sentence vector of an input sentence I and the sentence vector of an example sentence E from a bilingual corpus. We measure the similarity by computing the cosine of the angle between I and E.</Paragraph>
      <Paragraph position="1"> We output the target word of the example sentence whose cosine is maximal.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Modification of sentence vector
</SectionTitle>
      <Paragraph position="0"> The naive implementation of a sentence vector that uses the occurrence of words themselves suffers from data sparseness and unawareness of relevance.</Paragraph>
      <Paragraph position="1">  To reduce the adverse influence of data sparseness, we count occurrences by not only the words themselves but also by the semantic categories of the words given by a thesaurus. For example, the &amp;quot; (not sweet)&amp;quot; sentences of  The most similar vector Table 1 have the different cue words of &amp;quot;beer,&amp;quot; &amp;quot;sherry,&amp;quot; and &amp;quot;wine,&amp;quot; and the cues are merged into a single semantic category alcohol in the sentence vectors.</Paragraph>
      <Paragraph position="2">  The previous subsection does not consider the relevance to the target selection of each element of the vectors; therefore, the selection may fail due to non-relevant elements. We exploit the term frequency and inverse document frequency in information retrieval research. Here, we regard a group of sentences that share the same target word as a document.&amp;quot; Vectors are made not sentence-wise but group-wise. The relevance of each dimension is the term frequency multiplied by the inverse document frequency. The term frequency is the frequency in the document (group). A repetitive occurrence may indicate the importance of the word. The inverse document frequency corresponds to the discriminative power of the target selection. It is usually calculated as a logarithm of N divided by df where N is the number of the documents (groups) and df is the frequency of documents (groups) that include the word.</Paragraph>
      <Paragraph position="3"> Cluster 1: a piece of paper money, C() source sentence 1: May I have change for a ten dollar bill? target sentence 1: source sentence 2: Could you change a fifty dollar bill? target sentence 2: ssss Cluster 2: an account, C() source sentence 3: I've already paid the bill. target sentence 3: source sentence 4: Isn't my bill too high? target sentence 4: source sentence 5: I'm checking out. May I have the bill, please? target sentence 5: q-q-q-q-Table 2 Samples of groups clustered by target equivalence</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Pre-processing of corpus
</SectionTitle>
    <Paragraph position="0"> Before generating vectors, the given bilingual corpus is pre-processed in two ways (1) words are aligned in terms of translation; (2) sentences are clustered in terms of target equivalence to reduce problems caused by data sparseness.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Word alignment
</SectionTitle>
      <Paragraph position="0"> We need to have source words and target words aligned in parallel corpora. We use a word alignment program that does not rely on parsing (Sumita, 2000). This is not the focus of this paper, and therefore, we will only describe it briefly here.</Paragraph>
      <Paragraph position="1"> First, all possible alignments are hypothesized as a matrix filled with occurrence similarities between source words and target words.</Paragraph>
      <Paragraph position="2"> Second, using the occurrence similarities and other constraints, the most plausible alignment is selected from the matrix.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Clustering by target words
</SectionTitle>
      <Paragraph position="0"> We adopt a clustering method to avoid the sparseness that comes from variations in target words.</Paragraph>
      <Paragraph position="1"> The translation of a word can vary more than the meaning of the target word. For example, the English word &amp;quot;bill&amp;quot; has two main meanings: (1) a piece of paper money, and (2) an account. In Japanese, there is more than one word for each meaning. For (1), &amp;quot;s&amp;quot; and &amp;quot; &amp;quot; can correspond, and for (2), &amp;quot;,&amp;quot; &amp;quot;q -,&amp;quot; and &amp;quot;&amp;quot; can correspond.</Paragraph>
      <Paragraph position="2"> The most frequent target word can represent the cluster, e.g., &amp;quot;&amp;quot; for (1) a piece of paper money; &amp;quot;&amp;quot; for (2) an account. We assume that selecting a cluster is equal to selecting the target word.</Paragraph>
      <Paragraph position="3"> If we can merge such equivalent translation variations of target words into clusters, we can improve the accuracy of lexical transfer for two reasons: (1) doing so makes the mark larger by neglecting accidental differences among target words; (2) doing so collects scattered pieces of evidence and strengthens the effect.</Paragraph>
      <Paragraph position="4"> Furthermore, word alignment as an automated process is incomplete. We therefore need to filter out erroneous target words that come from alignment errors. Erroneous target words are considered to be low in frequency and are expected to be semantically dissimilar from correct target words based on correct alignment. Clustering example corpora can help filter out erroneous target words.</Paragraph>
      <Paragraph position="5"> By calculating the semantic similarity between the semantic codes of target words, we perform clustering according to the simple algorithm in subsection 3.2.2.</Paragraph>
      <Paragraph position="6">  Suppose each target word has semantic codes for all of its possible meanings. In our thesaurus, for example, the target word &amp;quot;s&amp;quot; has three decimal codes, 974 (label/tag), 829 (counter) and 975 (money) and the target word &amp;quot;&amp;quot; has a single code 975 (money). We represent this as a code vector and define the similarity between the two target words by computing the cosine of the angle between their code vectors.</Paragraph>
      <Paragraph position="7">  We adopt a simple procedure to cluster a set of n target words X = {X</Paragraph>
      <Paragraph position="9"> sorted in the descending order of the frequency</Paragraph>
      <Paragraph position="11"> in a sub-corpus including the concerned source word.</Paragraph>
      <Paragraph position="12"> We repeat (1) and (2) until the set X is empty.</Paragraph>
      <Paragraph position="13">  The threshold of semantic similarity T is determined empirically. T in the experiment was 1/2.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="1" type="metho">
    <SectionTitle>
4 Experiment
</SectionTitle>
    <Paragraph position="0"> To demonstrate the feasibility of our proposal, we conducted a pilot experiment as explained in this section.</Paragraph>
    <Paragraph position="1"> Number of sentence pairs (English-Japanese) 19,402 Number of source words (English) 156,128 Number of target words (Japanese) 178,247 Number of source content words (English) 58,633 Number of target content words (Japanese) 64,682 Number of source different content words (English) 4,643 Number of target different content words (Japanese) 6,686 Table 3 Corpus statistics</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Experimental conditions
</SectionTitle>
      <Paragraph position="0"> For our sentence vectors and code vectors, we used hand-made thesauri of Japanese and English covering our corpus (for a travel arrangement task), whose hierarchy is based on that of the Japanese commercial thesaurus Kadokawa Ruigo Jiten (Ohno and Hamanishi, 1984).</Paragraph>
      <Paragraph position="1"> We used our English-Japanese phrase book (a collection of pairs of typical sentences and their translations) for foreign tourists. The statistics of the corpus are summarized in Table 3. We word-aligned the corpus before generating the sentence vectors.</Paragraph>
      <Paragraph position="2"> We focused on the transfer of content words such as nouns, verbs, and adjectives. We picked out six polysemous words for a preliminary evaluation: bill, dry, call in English and &amp;quot;,&amp;quot; &amp;quot;qM,&amp;quot; &amp;quot;&amp;quot; in Japanese.</Paragraph>
      <Paragraph position="3"> We confined ourselves to a selection between two major clusters of each source word using the method in subsection 3.2</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="1" type="sub_section">
      <SectionTitle>
4.2 Selection accuracy
</SectionTitle>
      <Paragraph position="0"> We compared the accuracy of our proposal using the vector-space model (vsm system) with that of a decision-by-majority model  Here, the accuracy of the baseline system is #1 (the number of target sentences of the most major cluster) divided by #1&amp;2 (the number of target sentences of clusters 1 &amp; 2). The accuracy of the vsm system is #correct (the number of vsm answers that match the target sentence) divided by #1&amp;2.</Paragraph>
      <Paragraph position="1"> #all #1&amp;2 Coverage bill [noun] 63 47 74% call [verb] 226 179 79% dry [adjective] 8 6 75%  Judging was done mechanically by assuming that the aligned data was 100% correct.</Paragraph>
      <Paragraph position="2">  Our vsm system achieved an accuracy from about 60% to about 80% and outperformed the baseline system by about 5% to about 20%.  This does not necessarily hold, therefore, performance degrades in a certain degree.</Paragraph>
    </Section>
    <Section position="3" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.3 Coverage of major clusters
</SectionTitle>
      <Paragraph position="0"> One reason why we clustered the example database was to filter out noise, i.e., wrongly aligned words. We skimmed the clusters and we saw that many instances of noise were filtered out. At the same time, however, a portion of correctly aligned data was unfortunately discarded. We think that such discarding is not fatal because the coverage of clusters 1&amp;2 was relatively high, around 70% or 80% as shown in Table 5. Here, the coverage is #1&amp;2 (the number of data not filtered) divided by #all (the number of data before discarding).</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="1" end_page="1" type="metho">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
5.1 Accuracy
</SectionTitle>
      <Paragraph position="0"> An experiment was done for a restricted problem, i.e., select the appropriate one cluster (target word) from two major clusters (target words), and the result was encouraging for the automation of the lexicography for transfer.</Paragraph>
      <Paragraph position="1"> We plan to improve the accuracy obtained so far by exploring elementary techniques: (1) Adding new features including extra linguistic information such as the role of the speaker of the sentence (Yamada et al., 2000) (also, the topic that sentences are referring to) may be effective; and (2) Considering the physical distance from the concerned input word, which may improve the accuracy. A kind of window function might also be useful; (3) Improving the word alignment, which may contribute to the overall accuracy.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
5.2 Data sparseness
</SectionTitle>
      <Paragraph position="0"> In our proposal, deficiencies in the naive implementation of vsm are compensated in several ways by using a thesaurus, grouping, and clustering, as explained in subsections 2.3 and 3.2.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="1" end_page="1" type="metho">
    <SectionTitle>
5.3 Future work
</SectionTitle>
    <Paragraph position="0"> We showed only the translation of content words. Next, we will explore the translation of function words, the word order, and full sentences.</Paragraph>
    <Paragraph position="1"> Our proposal depends on a handcrafted thesaurus. If we manage to do without craftsmanship, we will achieve broader applicability. Therefore, automatic thesaurus construction is an important research goal for the future.</Paragraph>
    <Paragraph position="2"> Conclusion In order to overcome a bottleneck in building a bilingual dictionary, we proposed a simple mechanism for lexical transfer using a vector space.</Paragraph>
    <Paragraph position="3"> A preliminary computational experiment showed that our basic proposal is promising. Further development, however, is required: to use a window function or to use a better alignment program; to compare other statistical methods such as decision trees, maximal entropy, and so on.</Paragraph>
    <Paragraph position="4"> Furthermore, an important future work is to create a full translation mechanism based on this lexical transfer.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML