File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/n03-1015_evalu.xml
Size: 2,247 bytes
Last Modified: 2025-10-06 13:58:54
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-1015"> <Title>Word Sense Acquisition from Bilingual Comparable Corpora</Title> <Section position="7" start_page="2" end_page="2" type="evalu"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> Our method has several practical advantages. One of these is that it produces a corpus-dependent inventory of word senses. That is, the resulting inventory covers most senses relevant to a domain, while it excludes senses irrelevant to the domain.</Paragraph> <Paragraph position="1"> Second, our method unifies word sense acquisition with word sense disambiguation. The sense-vs.-clue correlation matrix is originally used for word sense disambiguation. Therefore, our method guarantees that acquired senses can be distinguished by machines, and further it demonstrates the possibility of automatically optimizing the granularity of word senses.</Paragraph> <Paragraph position="2"> Some limitations of the present methods are discussed in the following with possible future extensions. First, our method produces a hierarchy of clusters but cannot produce a set of disjoint clusters. It is very important to terminate merging senses autonomously during an appropriate cycle. Comparing distribution patterns (not subordinate ones) may be useful to terminate merging; senses characterized by complementary distribution patterns should not be merged. Second, the present method assumes that each translation equivalent represents one and only one sense of the target word, but this is not always the case. A Japanese Katakana word resulting from transliteration of an English word sometimes represents multiple senses of the English word. It is necessary to detect and split translation equivalents representing more than one sense of the target word.</Paragraph> <Paragraph position="3"> Third, not only are acquired senses rather coarse-grained but also generic senses are difficult to acquire. One of the reasons for this may be that we rely on co-occurrence in the window. The fact that most distributional word clustering methods use syntactic co-occurrence suggests that it is the most effective tool for extracting pairs of related words.</Paragraph> </Section> class="xml-element"></Paper>