File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1016_intro.xml

Size: 4,859 bytes

Last Modified: 2025-10-06 14:01:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1016">
  <Title>Synonymous Collocation Extraction Using Translation Information</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper addresses the problem of automatically extracting English synonymous collocation pairs using translation information. A synonymous collocation pair includes two collocations which are similar in meaning, but not identical in wording.</Paragraph>
    <Paragraph position="1"> Throughout this paper, the term collocation refers to a lexically restricted word pair with a certain syntactic relation. For instance, &lt;turn on, OBJ, light&gt; is a collocation with a syntactic relation verb-object, and &lt;turn on, OBJ, light&gt; and &lt;switch on, OBJ, light&gt; are a synonymous collocation pair.</Paragraph>
    <Paragraph position="2"> In this paper, translation information means translations of collocations and their translation probabilities. null Synonymous collocations can be considered as an extension of the concept of synonymous expressions which conventionally include synonymous words, phrases and sentence patterns. Synonymous expressions are very useful in a number of NLP applications. They are used in information retrieval and question answering (Kiyota et al., 2002; Dragomia et al., 2001) to bridge the expression gap between the query space and the document space. For instance, &amp;quot;buy book&amp;quot; extracted from the users' query should also in some way match &amp;quot;order book&amp;quot; indexed in the documents. Besides, the synonymous expressions are also important in language generation (Langkilde and Knight, 1998) and computer assisted authoring to produce vivid texts.</Paragraph>
    <Paragraph position="3"> Up to now, there have been few researches which directly address the problem of extracting synonymous collocations. However, a number of studies investigate the extraction of synonymous words from monolingual corpora (Carolyn et al., 1992; Grefenstatte, 1994; Lin, 1998; Gasperin et al., 2001). The methods used the contexts around the investigated words to discover synonyms. The problem of the methods is that the precision of the extracted synonymous words is low because it extracts many word pairs such as &amp;quot;cat&amp;quot; and &amp;quot;dog&amp;quot;, which are similar but not synonymous. In addition, some studies investigate the extraction of synonymous words and/or patterns from bilingual corpora (Barzilay and Mckeown, 2001; Shimohata and Sumita, 2002). However, these methods can only extract synonymous expressions which occur in the bilingual corpus. Due to the limited size of the bilingual corpus, the coverage of the extracted expressions is very low.</Paragraph>
    <Paragraph position="4"> Given the fact that we usually have large mono-lingual corpora (unlimited in some sense) and very limited bilingual corpora, this paper proposes a method that tries to make full use of these different resources to get an optimal compromise of precision and coverage for synonymous collocation extraction. We first obtain candidates of synonymous collocation pairs based on a monolingual corpus and a word thesaurus. We then select those appropriate candidates using their translations in a second language. Each translation of the candidates is assigned a probability with a statistical translation model that is trained with a small bilingual corpus and a large monolingual corpus. The similarity of two collocations is estimated by computing the similarity of their vectors constructed with their corresponding translations. Those candidates with larger similarity scores are extracted as synonymous collocations. The basic assumption behind this method is that two collocations are synonymous if their translations are similar. For example, &lt;turn on, OBJ, light&gt; and &lt;switch on, OBJ, light&gt; are synonymous because both of them are translated into &lt; , OBJ, &amp;C &gt; (&lt;kai1, OBJ, deng1&gt;) and &lt; ' , OBJ, &amp;C &gt; (&lt;da3 kai1, OBJ, deng1&gt;) in Chinese.</Paragraph>
    <Paragraph position="5"> In order to evaluate the performance of our method, we conducted experiments on extracting three typical types of synonymous collocations.</Paragraph>
    <Paragraph position="6"> Experimental results indicate that our approach achieves 74% average precision and 64% recall respectively, which considerably outperform those methods that only use monolingual corpora or only use bilingual corpora.</Paragraph>
    <Paragraph position="7"> The remainder of this paper is organized as follows. Section 2 describes our synonymous collocation extraction method. Section 3 evaluates the proposed method, and the last section draws our conclusion and presents the future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML