File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2108_metho.xml

Size: 12,753 bytes

Last Modified: 2025-10-06 14:09:18

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2108">
  <Title>Dictionaries merger for text expansion in question answering</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 What the QA method needs
</SectionTitle>
    <Paragraph position="0"> The QA system (Jacquemin, 2003) is based on a matching procedure between query and text segment. As most of the other approaches, my methodology solves the problem of the difierent ways to express the same idea by adding to the utterance ('enrichments') synonyms, derivatives or words belonging to the same taxonomy.</Paragraph>
    <Paragraph position="1"> My method entails two new features: First, it uses semantic disambiguation in order to choose the right meaning to each word in the sentences.</Paragraph>
    <Paragraph position="2"> I notice that most of the QA systems try to give as many enrichments as possible to a word rather than to a meaning. The answers often correspond to a sense difierent from the original one. But if each enrichment has the same sense as the original one, the noise decreases.</Paragraph>
    <Paragraph position="3"> The fact that a semantic disambiguator needs a large context to the word to be disambiguated (Weaver, 1949) provides the second feature: the query generally comprises few words. I decided to process the documents to build an enriched informative structure (Jacquemin, 2004). But this feature falls outside the scope of this paper. My semantic disambiguator (Jacquemin et al., 2002) is an evolution of a tool previously developed for both French and English at XRCE (Brun, 2000; Brun et al., 2001). The idea is to use a dictionary as a tagged corpus to extract semantic disambiguation rules. The contextual data (syntactic, lexico-syntactic and semanticosyntactic) for a given sense of a word are seen as difierential indications. So when the schema is found in the context of this word in a sentence, the corresponding sense is assigned.</Paragraph>
    <Paragraph position="4"> In flgure 1, we can see how a disambiguation rule is extracted from an informative fleld of Dubois' French dictionary (Dubois and Dubois-Charlier, 1997). From the instance fleld of the entry remporter in its second sense gagner (to win), the XIP parser (A~++t-Mokhtar et al., 2002) extracts a lexico-syntactic schema: VARG[DIR] Example from Dubois' dictionary (entry: remporter): null On remporte la victoire sur ses adversaires (sense nb 2 : gagner) We win a victory over our adversaries.</Paragraph>
    <Paragraph position="5">  means that the argument victoire is a direct object of the argument remporter. The rule built from this dependency indicates that the sense of the word remporter, in a context where victoire (victory) is the direct object, is the second sense gagner (to win). Two other types of rules exists: the flrst type puts lexical rules into general use, replacing lexical arguments by corresponding semantic classes. The other one uses syntactic schemas stipulated by the dictionary (for instance: transitive, re exive, etc.).</Paragraph>
    <Paragraph position="6"> The dictionary needs of both QA system and semantic disambiguator are of two natures.</Paragraph>
    <Paragraph position="7"> First, the dictionary is required to share out data following sense and not following lemma: The data are difierential indications. Second, the dictionary is required to contain contextual information as much as possible: examples or collocations (lexical rules), semantic classes or application domains (generalized rules), subcategorisation...The Dubois' dictionary yields to these demands, and moreover it contains some data that could be helpful to enrich an utterance: synonyms, instructions for derivations...</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Enrichment problems
</SectionTitle>
    <Paragraph position="0"> Several expansion solutions are proposed by the QA approaches: use of synonyms or taxonomy's members, stemming or use of derivatives...</Paragraph>
    <Paragraph position="1"> Dubois' dictionary contains some synonyms linked with a sense of the word they are synonymous with. But these synonyms are too few to provide su-cient enrichments. The system needs one or more synonyms dictionaries to complete Dubois' gaps. No synonyms dictionaryshares out thesynonymsbysense of theentry, except EuroWordNet (Catherin, 1999). But EuroWordNet's sharing out into senses does not match Dubois' senses. Thus the question is to distribute the available synonyms of each word to the right sense in Dubois'.</Paragraph>
    <Paragraph position="2"> The stemming, which considers two words with the same stem nearly synonyms, is too unpredictable to be used in a methodology that tries to avoid noise. As Dubois' provides instructions to form derivatives from lemma and su-xes for some senses, the derivation is preferred to the stemming. But the instructions are often vague, and indicate only the su-x to use and the new part-of-speech. It is not su-cient to be used automatically. Thus the derivation procedure needs an extra tool able to propose derivatives, including the right one. Dubois' information is su-cient to fllter and classify them. Finally, Dubois' does not provide taxonomy, and the French resources containing a semantic hierarchy do not supply contextual information. The taxonomy has to be found in another resource, which is not consistent with the reference dictionary. The compatibility between senses of all these resources is the objective.</Paragraph>
    <Paragraph position="3"> 4 How to make the dictionaries compatible The main di-culty is to share out information collected from extra dictionaries. The dictionaries are incompatible with Dubois', but new data have to be distributed following the senses of the entries of the reference dictionary.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Synonyms
</SectionTitle>
      <Paragraph position="0"> Three resources are at my disposal: Bailly's dictionary (Bailly, 1947), an electronic dictionary designed by Memodata, and the French EuroWordNet (Catherin, 1999). The expansion methods commonly use all the available synonyms for a word, but my approach has to keep only the synonyms corresponding to the current sense of the word. For each considered sense for a word, Dubois' provides semantic features: a semantic class and an application domain.</Paragraph>
      <Paragraph position="1"> The synonyms from the extra dictionaries are proposals. A proposal for a lemma in Dubois' dictionary is kept for a given sense only if one sense at least of the Dubois' entry corresponding to the proposal matches the semantic features of the given sense. If no sense of the proposal matches the semantic features of the given sense, the proposal is rejected for this sense.</Paragraph>
      <Paragraph position="2"> In flgure 2, the problem is to determine which proposal matches the word ravir in the sense nb  sense are the application domain SOC (sociology) and the semantic class S4 (to grip, to own). The proposal charmer, which features are PSY (psychology) and P2 (psychological verb) does not match the features of ravir 2. The proposal dPerober in its second and fourth senses has the same features. This proposal is conflrmed for ravir in sense nb 2. It will be used as enrichment when the sense nb 2 of ravir is detected in an utterance by the semantic disambiguator. This procedure is applied for all the proposed synonyms for all the senses of each entry in Dubois'.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Derivatives
</SectionTitle>
      <Paragraph position="0"> The derivation fleld in the Dubois' provides sufflcient indications to recognize the stipulated derivatives of an entry in a determined sense.</Paragraph>
      <Paragraph position="1"> Thus, the need is a resource or a tool providing all the potential derivative from a word.</Paragraph>
      <Paragraph position="2"> Resources are rare and incomplete for French, but I have to my disposal a tool (Gaussier et al., 2000) able to construct su-xal derivatives from a word. If the only constraint requires the derivatives belong to the lexicon, all the right su-xal forms are provided among the incorrect proposals. When all the proposals are produced, the su-x of each proposal is compared with the instructions in the dictionary. When they match, the derivative is kept for the current sense. If not, the derivative is rejected.</Paragraph>
      <Paragraph position="3"> Derivatives for the verb couper:  Theflgure3showstheworkingofthemethod.</Paragraph>
      <Paragraph position="4"> For the verb couper in the sense trancher (to cut, to slice), Dubois' indicates derivatives with su-xes -ure: coupure (break), -ant: coupant (sharp) and -eur: coupeur (cutter). But no instruction is given for a su-x -able. The wrong derivative coupable (guilty) is rejected.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Taxonomy
</SectionTitle>
      <Paragraph position="0"> Only two resources containing taxonomy exist for French. AlethDic (Gsi, 1993) is known for its very bad quality. The hierarchy is neither very deep, nor very large. The semantic relations are not strictly deflned inside the hierarchy. Because of this, I rejected AlethDic.</Paragraph>
      <Paragraph position="1"> The other resource is EuroWordNet. Two kind of taxonomic relations are deflned: hyperonymy (and hyponymy), and meronymy (and holonymy). The other semantic relations of this resource fall outside the scope of this paper.</Paragraph>
      <Paragraph position="2"> The taxonomic relations link synsets together. The synsets contain synonymous words for at least one of their senses. The taxonomy is usable by the QA system only if the sense of the whole synset can be identifled, and if the sense matches at least one of the sense of the word under consideration in Dubois' dictionary.</Paragraph>
      <Paragraph position="3"> So each word in Dubois' has to be linked with a synset to be inserted into a taxonomic hierarchy. That amounts to match senses in Dubois' and synsets in EuroWordNet. We already have some senses in Dubois' matching sets of synonyms in EuroWordNet. It is easy to use the additional synonyms from EuroWordNet to set up a correspondence between sense of Dubois' dictionary and synsets of EuroWordNet.</Paragraph>
      <Paragraph position="4"> The procedure is to examine all the synsets whereaconsideredwordappears. Foreachofits sense, if the majority of the synonyms obtained from EuroWordNet are contained in a synset, the meaning illustrated by the synset and this sense of the word are considered to be equivalent. In this case, the word under consideration is inserted in this place into the taxonomic hierarchy. Otherwise, the synset is not seen to match the sense, and it is rejected.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Experienced problems
</SectionTitle>
    <Paragraph position="0"> The di-culties met difier for each kind of processed data. In the sharing out of the synonyms, the system cannot determine automatically the meaning of a multiword expression. Dubois' only lists single words, and no semantic feature can be allocated to a multiword expression. A multiword proposal is considered to have all the meaning of the word to which it is synonym.</Paragraph>
    <Paragraph position="1"> I have no real evaluation of this procedure: the division into senses of the reference dictionary is as always open to doubt. Considering the result, examiners never agree with each synonyms for a sense. But when they agree (three examiners where consulted), they where satisfled by more than 80% of the synonyms.</Paragraph>
    <Paragraph position="2"> The derivation tool provides nearly all the derivatives from a word when no constraint is deflned. Most of the wrong derivatives (about 97%) are screened by the instructions supplied by Dubois' dictionary. However, these flgure are not valid for short words: the tool is designated in such a way that derivatives with a radical shorter than 3 letters are generally wrong.</Paragraph>
    <Paragraph position="3"> Moreover, the instructions are often incomplete in the dictionary, above all nominal entries.</Paragraph>
    <Paragraph position="4"> The promising procedure using taxonomy, presented above, is still a suggestion. I am facing with the problem that EuroWordNet covers only a small part of the French lexicon. A more proper trial should use WordNet (Fellbaum, 1998), that covers a huge part of the English lexicon, in an English QA application.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML