File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/w91-0217_metho.xml
Size: 12,026 bytes
Last Modified: 2025-10-06 14:12:50
<?xml version="1.0" standalone="yes"?> <Paper uid="W91-0217"> <Title>PROPERTY NATURE: STRUCTURE: ORIGIN: STATE: TASTE: SMELL:</Title> <Section position="3" start_page="189" end_page="192" type="metho"> <SectionTitle> 3 Semantic Information </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="189" end_page="190" type="sub_section"> <SectionTitle> 3.1 Towards standardization at the taxonomic level </SectionTitle> <Paragraph position="0"> The procedural methodology for acquiring taxonomic information can be considered rather well established (see e.g. Byrd et al. 1987, Calzolari 1988). In this respect, the dictionary is considered as a &quot;classificatory device&quot;, i.e. an empirical means of instantiating concepts.</Paragraph> <Paragraph position="1"> The MRD gives in fact one possible way of learning a concept (where the &quot;learning&quot; process assumes an inductive form): linking a concept to all its instances. All the instances of the same category/class are in fact extracted and connected together.</Paragraph> <Paragraph position="2"> What is of interest with respect to taxonomies are not the leaf nodes, representing rather specific words, but middle- and top-level nodes in the IS-A hierarchy. These represent the core concepts by means of which the other words are defined via taxonomic relationship. An attempt to normalization is therefore being made in the project at the level of these core-nodes, in order both a) to give a more consistent structure to the hierarchy deriving from definitions, and b) to make possible the linking and merging of taxonomies extracted from different dictionaries and from different languages.</Paragraph> <Paragraph position="3"> Analyses have already been performed which lead to the grouping of subsets of nodes under a same &quot;conceptual label&quot; representing the generalization over specific lexicalization of a similar lexical meaning. These conceptual labels are obtained through a comparative analysis of the different taxonomies for the different dictionaries, in order to create links and mappings between them, and constitute a first simple attempt of standardization at the semantic level. They should be analysed and compared with semantic primitives or features stated in other semantic systems.</Paragraph> </Section> <Section position="2" start_page="190" end_page="191" type="sub_section"> <SectionTitle> 3.2 More complex semantic and world-knowledge information: </SectionTitle> <Paragraph position="0"> &quot;types&quot; and representation in terms of common feature structures null The aim of ACQUILEX in its second year is to extract, in addition to the simple IS-A links, more complex -- and so far not really thoroughly analysed -- semantic information hidden in the 'differentia' of the lexicographic definitions. In a project with several partners there is the necessity of working with similar (global) strategies of knowledge acquisition in order to reach the same result, i.e. a common LKB. We do not rely on a random extraction, but apply a knowledge acquisition strategy which, according to our views, must be guided by: a) empirical observations, b) theoretical hypotheses.</Paragraph> <Paragraph position="1"> What is meant by a) is rather simple, being the so often observed systematic regularities and similarities of lexical items and definitional patterns. By b) we intend the use of the theoretical approach to lexical semantics put forward by Pustejovsky (see Pustejovsky 1989, Pustejovsky and Boguraev forthcoming) in his 'qualia structures'. This approach makes use of a knowledge representation framework to express different aspects of knowledge structures concerning words. The qualia structure for a noun defines its essential attributes and &quot;is in essence an argument structure for nouns&quot;. In ACQUILEX we use a similar approach and similar types of structures, but in a broader sense than the qualia structures used by Pustejovsky which are made up of four main Roles (Constitutive, Formal, Telic, Agentive). These four main roles on the one side do not cover the whole range of lexical notions which characterize nominals, and on the other side do not include other pertinent world-knowledge information which can be useful in many NLP tasks or applications and can be found in MRDs.</Paragraph> <Paragraph position="2"> We therefore take the underlying hypothesis of having &quot;meaning types&quot; and use the notion of &quot;template&quot; as main structuring device for semantic information, but enlarge this notion of template to include and represent: i) other semantic information not covered by the four main roles, and moreover ii) also more general or encyclopedic information concerning the concepts. An example of the template derived from the analysis of the definitions of three monolingual dictionaries (two Italian, one English) is given in Figure 1 for the concept of SUBSTANCE.</Paragraph> <Paragraph position="3"> Dictionary definitions are suitable for an explicit representation in terms of feature structures as data types, which reproduce (at least partially) in an explicit way the original linear textual data (obviously not all of the definitions of a dictionary, and often not the entire definition).</Paragraph> <Paragraph position="4"> This feature structure can be seen as a &quot;meaning type&quot;, representing a maximal frame for a class of words (e.g. all the words defined by the word 'substance' or by its hyponyms). This frame, with all the potential attributes which in the definitions are found as most relevant for this subset, is inherited (as a &quot;potential meaning type&quot;) by all the hyponyms of a &quot;Top Node&quot; (SUBSTANCE) and will be filled in some of its slots for each individual hyponym.</Paragraph> <Paragraph position="5"> If we consider the &quot;meaning structure&quot; of LIQUID (see Figure 2) it is constituted by a subset of the attributes of SUBSTANCE, the same holds for GAS, and so on.</Paragraph> <Paragraph position="6"> There will obviously be different &quot;meaning types&quot; for different categories of words. Attributes for verbs usually represent thematic roles which are relevant for a given &quot;Action Type&quot; and, where possible, also aspectual information is now being semi-automatically extracted from the definitions (see Alonge 1991). An example of the structures which result from dictionary definitions for the verbs of &quot;hitting&quot; and &quot;dividing&quot; is given in specified argument (e.g. in &quot;to hammer&quot; the instrument role is lexically specified). However, as seen above, the view of assigning to nouns descriptions in terms of &quot;frames&quot; is also taken, with attributes or slots (and fillers), which are, at least partly, acquired from a procedural analysis of the definitions.</Paragraph> <Paragraph position="7"> Different types of templates with different attributes are typical of &quot;derivatives&quot;, which constitute a very large portion of a lexicon. They exhibit very special patterns and relationships with respect to their bases, with very interesting properties from a linguistic point of view. Their conceptual templates contain, among many others, attributes such as: AGENT, ACT_OF, PROPERTY_NAME, LOCATION, SET_OF, etc., but also attributes of a more encyclopedic nature such as: INHABITANT_OF, FOLLOWER.OF, etc.</Paragraph> <Paragraph position="8"> We can associate to each of these relational patterns, which contribute to defining a very large amount of lexical items, conceptual templates (sets of properties) which are then inherited by default by all their defined words. As an example, we can associate to the AGENT attribute, among the others, the following set of attributes:</Paragraph> <Paragraph position="10"> where 'verb' is a variable which takes as value the base- verb for each derivative. E.g.: lavoratore: \[AGENT: lavorare\] by default also inherits: \[IS_A: human, TELIC: lavorare\].</Paragraph> <Paragraph position="11"> It is with data of these types that we are beginning to feed the common LKB, a network consisting not only of IS_A relations, but of all the different types of semantic relations and semantic features implicitly present in the &quot;differentia&quot; part of all the definitions from all the available sources.</Paragraph> </Section> <Section position="3" start_page="191" end_page="192" type="sub_section"> <SectionTitle> 3.3 The Templates </SectionTitle> <Paragraph position="0"> The templates, or feature structures, in which we represent the semantic information, will serve many purposes in the whole process of knowledge acquisition, organization and representation.</Paragraph> <Paragraph position="1"> They can be used in the following tasks: a) As a guide both in the automatic or semi-automatic parsing of all the definitions of a same lexicM field, from the top to leaf-nodes in the taxonomy, and in the interpretation of the results of the parsing process (e.g. to predict and constrain the interpretation of certain types of structures, or the proper attachment of PPs, etc.). In this syntactic/semantic parsing process the appropriate attribute/slots in the template are filled with the pertinent values.</Paragraph> <Paragraph position="2"> b) As a common structure to be filled independently of the actual lexical/surfaccrealizations of the semantic features, both in different dictionaries for the same language and in different languages. They act therefore as a scheme which makes uniform the interpretation process throughout all the different sources.</Paragraph> <Paragraph position="3"> c) As a tool for comparing information coming from different sources, while making possible a semi- automatic and partiM mapping of word--senses. When a link or a mapping is established, the data coming from different dictionaries are combined, according to the results of the ~omparison, either: i) by merging, when different surface lexicMizations are found for the same underlying concept (and in this case a score can be given to reinforce that information), or ii) by integrating different types of information on the same lexical item, e.g. filling different attributes.</Paragraph> <Paragraph position="4"> d) As a tool for correcting errors or incoherencies, e.g. in the use of superordinates in the dictionary definitions.</Paragraph> <Paragraph position="5"> These tasks can be summarized as follows: These templates are inherited as potential &quot;meaning types&quot; by all the hyponyms, and the taxonomies are the vehicles by means of which this information is inherited. Obviously some of the values can be overridden by specific information.</Paragraph> </Section> </Section> <Section position="4" start_page="192" end_page="193" type="metho"> <SectionTitle> 4 Conclusions </SectionTitle> <Paragraph position="0"> With this common method of representing the information, the goal of sharing data and establishing correspondances among different sources is achieved. In this approach taxonomies and conceptual templates constitute in fact the point of convergence between different sources and languages, and between the empirical and the theoretical approaches.</Paragraph> <Paragraph position="1"> The taxonomies and the templates -- as developed within ACQUILEX -- already constitute a first degree of normalization or standardization in the representation of semantic and world-knowledge information, both across many (about 10) dictionaries and (4) languages, and between the lexicographic approach to semantics and theoretical approaches.</Paragraph> <Paragraph position="2"> This is the first time that a project of semantic and world-knowledge information encoding for a very large part of the lexicon is carried out in such an extensive way.</Paragraph> </Section> class="xml-element"></Paper>