File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0509_intro.xml
Size: 7,136 bytes
Last Modified: 2025-10-06 14:06:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0509"> <Title>An Overt Semantics with a Machine-guided Approach for Robust LKBs</Title> <Section position="3" start_page="0" end_page="63" type="intro"> <SectionTitle> 2 Application-driven Acquisition </SectionTitle> <Paragraph position="0"> The semantics of an entry is an underspecffied Text Meaning Representation (TMR) fragment (e g, De2Synsets represent WordNet's building blocks whmh are words, synonyms or Rear-synonyms, that can be used to refer to a given concept (Miller, 1990) \[\] fnse and Nlrenburg, 1991) Th,s TMR fragment can be a concept from the ontology or some lnterhngua structures such as att,tudes, modahtms, aspects, sets and TMR relatmns (addltion, enumeratlon, compamson) Concepts and lnterhngua structures can appear together or independently The ontology, to which lexemes are mapped, conmsts of concepts (named sets of property-value pairs) organized hlerarchmally along subsumptlon hnks, w~th an average of 14 relational hnks (such as ISA, SUBCLASS, AGENT, THEME-OF, HEADED-BY, HAS-MEMBER) per concept (Mahesh, 1996) In a multflmgual enwronment, the main practical advantage of connectmg the lexlcon to an ontology is cost-effect,veness, as only the &quot;language-dependent&quot; propertms have tO be acquired when adding new natural laffh~iages to the system The mapping between a word and the ontology is the most difficult task of lexicon acquisition, and requires to develop the most cost-effective approach in terms of trmmng and strategms 2 1 Importance of Training The expemment reported below shows that training is essential to determine the &quot;computational&quot; meanmg of a word A native spea\]~er of Spanish, who had not taken part m the lexacon traanmg process, was asked to add some senses to entries m the Spanish lexicon Thls was mainly done for testing the analyzer, as there were only 23 out of 167 words which were ambiguous m one text we were analyzing But we also d~scovered thls was a very useful exercise for testing the quahty of a semantic lexicon The list of added senses was reviewed by two computational hngmsts, one in charge of supervising the training and the other with proficiency m our framework who had seen entries as they were used by the analyzer but had not taken part to the training process either The untrained acqmrer, hereafter UN-ACQ, added a total of 111 to 55 open class words or so Among these 55 words where ambiguity had been added, 33 were already ambiguous in the Spanzsh lexicon After a closer look at the Spanish lexicon, and at the senses retrieved by the semantic analyzer, and after doing an on-hne corpora search, the computational hngulsts accepted less than 20 new senses among the 111 suggested This &quot;overge,aeratlor~&quot; of senses by UNACQ had different origins 1) the analyzer did not present all the senses from the Spanlsh lexicon to UNACQ, it only presented the ones that were accepted after syntactic binding, u) the senses added by UNACQ were &quot;equivalent&quot; to the senses already in the Spanish lexicon, but not recogmzed by UNACQ, as they were acquired as &quot;unspecffied&quot; in the Spanish lexicon, m) UNACQ hard-coded non-hteral meanings of the words, iv) the addition of senses was MRD-dnven UNACQ acquired the list of meanings provzded by the Spanlsh-Enghsh Larousse and Colhns, adopt,ng an enumeration approach Such a task Is not superficial, it ensures that the quahty of the core lexicon ,s good enough so that it can serve as a basis for lexicon expansion techtuques, some of which we develop below (see Vmgas (1999) for the choices an acqu,rer faces when workmg out the semantic mapping Of a word)</Paragraph> <Section position="1" start_page="62" end_page="62" type="sub_section"> <SectionTitle> 2.2 Strategms </SectionTitle> <Paragraph position="0"> There are mainly two approaches to word sense assignment corpus-dr,yen and mental-driven The former is better adapted to braiding lexicons used m analysis, whereas the latter better suits lexicons to be used in generation We refer to Kllganff (1997) for the corpus-driven approach, and discuss m this paper the mental-driven.approach A mental-driven or thesaurus-drlven approach consists m grouping together lexemes which share the same meaning In order to ensure consistency among acqulrers' mappings we have divided the process of acqu,rmg a coinputatlonal semantic lexicon into two phases preacquisition and acquisition There is still time to revise a pre-acqulred mapping at acquis,tlon time, if needed</Paragraph> </Section> <Section position="2" start_page="62" end_page="63" type="sub_section"> <SectionTitle> 2.3 The Pre-Acquisitmn Phase </SectionTitle> <Paragraph position="0"> For a generatmn lexicon, the method of preparing the pre-acqulsltlon files can be as follows 1) extract all concepts from the ontology, n) lexlcahze them usmg on-hne thesauri, dlctlonanes and native speakers' lntultmns, m) order pre-acqulsltion files accordmg to the semantic Mapping-Tag (see below) A pre-acqmmtion record includes 7 fields Semantics, Mapping-Tag, Lexeme, POS, Translations, Fiequency_, and Polysemy-Count The Semantics field includes only the ontological head concept, in which the word sense should be anchored (no selectlonal restrictions or other ptopertms are specified at this stage) The Mapping-Tag field (see below) describes the type of connectloa between the word sense and its conceptual meanmg some word senses are directly mapped (&quot;dim&quot; map) to a single concept in the ontology, wheleas the meaning of some other word senses is descllbed through the combination of concepts hnked vm propertles (relations or attributes) We defined seven tags which flag the entry for a specific task For mstance, &quot;devb&quot; (deverbal) is used primarily for nouns and adjectives when their meaning is a composition of a filler and an event (e g bombing, readable), &quot;asp&quot; (aspecttral) is used for true aspectuals (e g begin) and also with actions expressing aspectuallty (e g stare, duration prolonged) The Translatmns field includes an English translation (for languages other than English) Frequency, POS and polysemy count are extracted automatically, using on-hne large corpora for frequency, and WordNet for</Paragraph> <Paragraph position="2"> the part of speech and the polysemy count Bihngum dictionaries, filtered by a native speaker of the foreign language, are used for the translations into English (for the acquisition of languages other than English) In order to increase speed at acqulsitmn time, each acqmrer works on one type of Mapping-Tag at a time For instance, some acqmrers work on type OBJECT Type OBJECT call only be lexlcahzed into nouns, e g DEVICE -+ devzce instrument tool apphance Others work on the type EVENT ~VENTS can be lexlcahzed into nouns, e g EXPLODE --~ bombzn9, bombardment, or into verbs bomb, bombard, drop_bombs_on, throw_bombs_at In order to increase consistency, acquIrers go through specially designed trammg sessions</Paragraph> </Section> </Section> class="xml-element"></Paper>