File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-2114_intro.xml
Size: 3,824 bytes
Last Modified: 2025-10-06 14:05:41
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-2114"> <Title>A Best-Match Algorithm for Broad-Coverage Example-Based Disambiguation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Improvement of cow, rage in practical domains is one of the most important issues in the area of example-based systems. The examl)le-based apI)roach \[6\] has become a (:amman technique for m~turM language processing apI)lications such as machine translation *rod disambiguatkm (e.g. \[5, 10\]). However, few existing systems can cover a practical domain or handle a l)road range of phenomena.</Paragraph> <Paragraph position="1"> The most serious obstacle to robust example-based systems is the coverage of examt)le-bases. It is an oi)en question how many e~xaml)les are required for disambiguating sentences in a specific domain.</Paragraph> <Paragraph position="2"> The Sentence AnMyzer (SENA) wax developed in order to resolve attachment, word-sense, and conjunctive anlbiguitics t)y using constraints and example-based preferences \[11\]. It lists at)out 57,000 disambiguated head-modifier relationships and al)out 300,000 synonyms and is-a 1)inary~ relationships. Even so, lack of examl)les (no relevant examlfles ) accounted for 46.1% of failures in a experiment with SENA \[12\].</Paragraph> <Paragraph position="3"> Previously, it was believed to be easier to collect examples than to develop rules for resolving ambiguities. However, the coverage of each examltie is nmch nlore local than a rule, and therefore a huge munber of examt)les is required in order to resolve realistic 1)rot)lems. There has been some carl)uS-based research (m how to acquire large-scah~ knowledge automati(-ally in order to cover the domain to be disambiguatcd, lint there are still major 1)rot)lcnls to \])e overeonle.</Paragraph> <Paragraph position="4"> First, smmmtic kvowledge such as word-sense cannot be extracted by automatic cort)u~-base(l knowledge, acquisition. The example-base in SENA is deveh)l)ed by using a bootstr~q)ping method.</Paragraph> <Paragraph position="5"> However, the results of word-sense disambiguation nmst be (:he(:ked by a hutnan, a,nd word-senses are tagged to only about ;t half of all the examt)les , since the task is very time-consmning.</Paragraph> <Paragraph position="6"> A second ditliculty in the exalnple-t)ased attproach ix the algorithm itself, namely, the be.stmatch algorithm, which was used in earlier systems built around a thesaurus that consisted of a hierttrchy of is-a or synonym relationships between words (word-senses). This paper proposes two methods for ilnproving the coverage of exantple-bases. The selected domain is th~tt of sentences in comt)uter manmds.</Paragraph> <Paragraph position="7"> First, knowledge thtd; represents a type of similarity other than synonym or is-a relationships is a(> quired. As one measurement of the similarity, interchangeability between words (:~m be used. In this paper, two types of the relationship reflect such interchangeability. First, the elements of coordinated structures are good clues to the interchangeat)ility of words. Words can be extracted easily from a dolnain-specitic carl)us , and therefore the example-base can I)e adapted to the sl)ecific domain by using the domain-specific relationships.</Paragraph> <Paragraph position="8"> If there are no examples and relations in the thesaurus, the example-base gives no information for disambiguation. However, the text to be disam1)iguate.d provides useful knowledge for this purpose \[7, 3\]. '\['he relationshit)s between words in the example-base and ;ut unknown word can be guessed by comi)aring that word's usage in extracted cxantples and in the text.</Paragraph> </Section> class="xml-element"></Paper>