File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0805_metho.xml

Size: 11,365 bytes

Last Modified: 2025-10-06 14:14:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0805">
  <Title>Lexical Discrimination with the Italian Version of WORDNET</Title>
  <Section position="4" start_page="0" end_page="32" type="metho">
    <SectionTitle>
2 The Italian WORDNET Prototype
</SectionTitle>
    <Paragraph position="0"> The Italian version of WORDNET is based on the assumption that a large part of the conceptual relations defined for English (about 72,000 ISA relations and 5,600 PART-OF relations) can be shared with Italian.</Paragraph>
    <Paragraph position="1"> WORDNET can be described as a lexical matrix with two dimensions: the lexical relations, which hold among words and so are language specific, and the conceptual relations, which hold among senses and that, at least in part, we consider independent from a particular lan- null guage. The Italian version of WORDNET aims at the realization of a multilingual lexical matrix through the addition of a third dimension relative to the language. Figure 1 shows the three dimensions of the matrix: (a) words in a language, indicated by YY~ ; (b) meanings, indicated by .A4~; (c) languages, indicated by PSk. From an abstract point of view, to develop the multilingual matrix it is necessary to re-map the Italian lexical forms with corresponding meanings (.A4,), building the set of synsets for Italian (making explicit the values for the I intersections PS~). The result will be a complete redefinition of the lexical relations, while for the semantic relations, those originally defined for English will be used as much as possible.</Paragraph>
    <Paragraph position="2"> An implementation of the Multilingual lexical matrix has been realized which allows a complete integration with the English version and the availability of all the translations for the Italian lemmas. The architecture is easily extendable to other languages. The integration with the computational lexicon ILEX is under development: it will make the access to other levels of lexical information, such as morphological classes, syntactic categories and sub-categorization frames available. The Italian version of WORDNET, in December 1996, included about 10,000 lemmas (7,000 nouns, 700 verbs, 1,500 adjectives, 600 adverbs).</Paragraph>
    <Paragraph position="3"> Till now, data acquisition has been mostly manual, with the help of a graphical interface; however a basic goal of the project is the experimentation of techniques for the (semi)automatic acquisition of data. Algorithms for the resolution of the ambiguities in the coupling with the English WORDNET have been developed. Versions automatically created are then tested against manually acquired data, with the aim of incrementally improve the precision level. A final manual check is performed for all the data automatically acquired. It is also foreseen the use of corpora to extract contextual information to be used during the disambiguation process.</Paragraph>
  </Section>
  <Section position="5" start_page="32" end_page="34" type="metho">
    <SectionTitle>
3 Adding Selectional Restrictions to
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="32" end_page="34" type="sub_section">
      <SectionTitle>
Verbs
</SectionTitle>
      <Paragraph position="0"> A number of steps have been followed to add selectional restrictions to Italian WORDNET. First, Italian verb senses were extracted from a paper version of an Italian dictionary and checked against a corpus of genereric Italian texts. Each verb sense has been then coupled with one or more English WORDNET synsets 2. This phase has been performed manually with the help of a graphical interface (see figure 2) that includes four integrated working tools: (i) a bilingual dictionary with more than 30,000 lemmas; (ii) a graph that allows the visualization of the coupling with the English WORDNET; (iii) the bilingual WoRDNET, that behaves exactly like the English version with the additional possibility to browse the Italian semantic network; (iv) finally, the working cards allow the insertion, modification and check of the data for a synset. The result of this phase is the extension of the English WordNet with the Italian synsets.</Paragraph>
      <Paragraph position="1"> Figure 3 shows the correspondence between English and Italian synsets for the verb Scrivere (Write).</Paragraph>
      <Paragraph position="2"> The next step is the definition of the sense subgategorization frame. This includes both syntactic information (i.e., argumental positions, prepositions on indirect objects, category type) and semantic information, such as thematic roles and selectional restrictions. Syntactic information are associated to single verbs, while semantic information are associated to the whole synset, i.e., semantic participants are shared among all the verbs belonging to the synset.</Paragraph>
      <Paragraph position="3"> We built selectional restrictions using the synsets of the noun hyerarchy. Two different possibilities for defining selectional restrictions are considered: 1. Selectional restrictions obtained from the frames currently provided by WoRDNET.</Paragraph>
      <Paragraph position="4"> ~As for figurative uses, they can also be coupled with WoRDNET provided that an appropriate synset do exist.</Paragraph>
      <Paragraph position="5">  WORDNET noun hierarchy.</Paragraph>
      <Paragraph position="6"> As far as the first hypothesis is concerned, WoRDNET describes all the English verbs resorting to a set of 35 different syntactic frames, which in turn include only two restrictions, that is Something and Somebody. For example, the frames provided for the verb Write in the synset {Publish, Write} are: Somebody... s Somebody... s Something The problem arising in using these two restrictions is that they are completely uncorrelated to the noun synsets, then, they have to be matched with the proper synsets in the noun hierarchy. The concept Somebody includes not only the synset Person but also all the synsets denoting group of people that could hold the agent thematic role. We defined Somebody using the following boolean combination of synsets:</Paragraph>
      <Paragraph position="8"> Something is defined as ~he complement of Somebody.</Paragraph>
      <Paragraph position="9"> In the second hypothesis selectional restrictions are taken from the whole noun hierarchy. As an example, figure 4 illustrates the senses for the Italian verb Scrivere (Write) found in Italian WORDNET. For each sense we report a conventional name - which unambiguously identifies the synset - and the argumental positions admitted for that sense, with the indication of the selectional restrictions. The appropriate combination of synsets for an argumental position has to be both enough general to preserve all the human readings, and enough restricted for discriminating among different senses of both verb and noun.</Paragraph>
      <Paragraph position="10"> Founding the appropriate selectional restrictions revealed itself difficult and time consuming. The process required a deep search into the WOI~DNET noun hierarchy. In order to achieve a good trade-off between discrimination power and precision level we adopted an empirical process with successive steps of refinement. We started with general selectional restrictions and then we validate them against experimental results. This iteratire process ended with complex selectional restritions for verbs, as the figure 4 shows.</Paragraph>
      <Paragraph position="11"> The WORDNET verb taxonomy is based on the ~roponymy relation, which is defined as the co-occurrence of both lexical implication and temporal co-extension between two verbs. We would note that, every time a troponymy relation between two verbs holds, an ISA rela-</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="34" end_page="35" type="metho">
    <SectionTitle>
4 Coupling WORDNET and a TFS Parser
</SectionTitle>
    <Paragraph position="0"> In this section we describe the architecture we used for checking WORDNET usability in parsing. Italian WORD-NET has been used in two different phases of the linguistic analysis. On a first phase, we use Italian WORD-NET as a lexicon repository to carry on lexical analysis.</Paragraph>
    <Paragraph position="1"> During the semantic analysis Italian WORDNET is used as a kind of Knowledge Base (KB) exploiting the structural relationships among synsets. In particular, we used the supertype/subtype-like hierarchy of synsets during the parsing process in order to discard unplausible constituents on a semantic base.</Paragraph>
    <Paragraph position="2"> The parser used is a CYK chart parser embedded in the GEPPETTO environment \[Ciravegna C/t al., 1996\], and coupled with a proper unification algorithm. GEPPETTO is based on a Typed Feature Logic \[-Carpenter, 1992\] for the specification of linguistic data. The GEPPETTO environment allows to edit and debug grammars and lexica, linking linguistic data to a parser and/or a generator, integrating various form of KBs, and using specialized processors (e.g., morphological analyzers). In particular, we integrated the hierarchical structure of WORDNET as an external KB, while an ISA function uses the WORDNET hierarchy in order to check subsumption relationships between WoRDNET synsets.</Paragraph>
    <Paragraph position="3"> The grammar is written adopting a HPSG-like style, and each rule is regarded as Typed Feature Structure (TFS). For the current experiment the grammar coverage is limited to very simple verbal sentences formed by a subject, a main verb together with its internal arguments and, possibly, an adjunct phrase. Observe that, the syntactic analysis does not take into account the pp-attachment case. We excluded the possibility to capture these complex nominal phrases. Indeed, the object of the experiment is to disambiguate among WORDNET senses of both verbs and nouns on the basis of the lexical semantic restrictions for the arguments of the verb and the lexical semantic associated to the noun.</Paragraph>
    <Paragraph position="4"> A condition for using WORDNET coupled with the GEPPETTO environment is to bring it in a format effectively usable. The exploited idea was to rebuild the WORDNET hierarchy in CLOS, the object-oriented part of COMMON LISP. The advantages of this approach is  the possibdity to implement a fast and flexible access to the synsets hierarchy and, in particular, an efficient ISA functionality as required for the semantic checking during the parsing. The arguments to ISA function may be a complex boolean combination of synsets (e.g., see selectional restrictions in figure 4).</Paragraph>
    <Paragraph position="5"> The parser controls the overall processing. Whenever it tries to build a (partially recognized) constituent it incrementally verifies the admissibility of the semantic part of such a constituent, using the WORDNET hierarchy. In particular, whenever a noun is associated with a verbal argument the ISA function is triggered to check whether the synset of the noun is subsumed by the selectional restriction of the corresponding verbal argument. Due to the large number of analyses, it is useful to discard unplausible constituents as soon as possible to cut the search space. This has been obtained interliving the syntactic and semantic processes: as soon as the semantic test fails the constituent is rejected.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML