File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1007_metho.xml

Size: 31,871 bytes

Last Modified: 2025-10-06 14:10:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1007">
  <Title>Structural properties of Lexical Systems: Monolingual and Multilingual Perspectives</Title>
  <Section position="5" start_page="51" end_page="52" type="metho">
    <SectionTitle>
LS
</SectionTitle>
    <Paragraph position="0"> ), do not pose this type of problem for two reasons.</Paragraph>
    <Paragraph position="1"> First, they are not oriented towards the modeling of just a few specific lexical phenomena, but originate from a global vision of the lexicon as central component of linguistic knowledge.</Paragraph>
    <Paragraph position="2"> Second, they have a very simple, flat organization, that does not impose any hierarchical or classifying structure on the lexicon. Let us explain how it works.</Paragraph>
    <Paragraph position="3"> The design of any given LS has to follow four basic principles, that cannot be tampered with: LSs are 1) pure directed graphs, 2) non-hierarchical, 3) heterogeneous and 4) equipped for modeling fuzziness of lexical knowledge. We will briefly examine each of these principles.</Paragraph>
    <Paragraph position="4"> Pure directed graph.</Paragraph>
    <Paragraph position="5"> An LS is a directed graph, and just that. This means that, from a formal point of view, it is uniquely made up of nodes and oriented links connecting these nodes.</Paragraph>
    <Paragraph position="6"> Non hierarchical.</Paragraph>
    <Paragraph position="7"> An LS is a non-hierarchical structure, although it can contain sets of nodes that are hierarchically connected. For instance, we will see later that the DiCo LS contains nodes that correspond to a hierarchically organized set of semantic labels. The hierarchy of DiCo semantic labels can be used to project a structured perspective on the LS; but the LS itself is by no means organized according to one or more specific hierarchies.</Paragraph>
    <Paragraph position="8"> Heterogeneous.</Paragraph>
    <Paragraph position="9"> An LS is a potentially heterogeneous collection of nodes. Three main families of nodes can be found: * genuine lexical entities such as lexemes, idioms, wordforms, etc.; * quasi-lexical entities, such as collocations, lexical functions,  free expressions worth storing in the lexicon (e.g.</Paragraph>
    <Paragraph position="10"> &amp;quot;canned&amp;quot; linguistic examples), etc.; *lexico-grammatical entities, such as syntactic patterns of expression of semantic actants, grammatical features, etc.</Paragraph>
    <Paragraph position="11"> Prototypical LS nodes are first of all lexical entities, but we have to expect LSs to contain as nodes entities that do not strictly belong to the lexicon: they can belong to the interface between the lexicon and the grammar of the language. Such is the case of subcategorization frames, called government patterns in Explanatory Combinatorial Lexicology. As rules specifying patterns of syntactic structures, they belong to the grammar of the language. However, as preassembled constructs on which lexemes &amp;quot;sit&amp;quot; in sentences, they are clearly closer to the lexical realm of the language than rules for building passive sentences or handling agreements, for instance. With fuzziness.</Paragraph>
    <Paragraph position="12"> Each component of an LS, whether node or link, carries a trust value, i.e. a measure of its validity. Clearly, there are many ways of attributing and handling trust values in order to implement fuzziness in knowledge structures. For instance, in our experiments with the DiCo LS, we have adopted a simplistic approach, that was satisfactory for our present needs but should become more elaborate as we proceed with developing and using LSs. In our present implementation, we make use of only three possible trust values: &amp;quot;  &amp;quot; means that as far as we can tell--i.e. trusting what is explicitly asserted in the  that the corresponding information is the result of an inference made from the input data and was not explicitly asserted by lexicographers; &amp;quot;  &amp;quot; means that the information ought to be incorrect--for instance, in case we identified a bogus lexical pointer in data imported from the DiCo. Fuzziness encoding is an essential feature of LSs, as structures on which inference can take place or as structures that are, at least partially, inferred from others (in case of generation of LSs from existing lexical databases). Of course, any trust value is not absolute. &amp;quot;  &amp;quot; does not mean the information is valid no matter what, and &amp;quot;0 &amp;quot; that it is necessarily false. Information in LSs, and the rating of this information, is no more absolute than any information that may be stored in someone's mental lexicon. However, if we want to compute on LSs' content, it is essential to be able to distinguish between data we have all reasons to believe to be true and data we have all reasons to believe to be false. As a matter of fact, this feature of LSs has helped us in two ways while compiling the DiCo LS: (i) we were able to infer new descriptions from data contained in the original DiCo while keeping track of the inferred nature of this new information (that ought to be validated); (ii) we kept record of incoherences found in the DiCo by attributing a trust value of  to the corresponding elements in the LS.</Paragraph>
    <Paragraph position="13"> It is now high time to give concrete examples of LS data. But before we proceed, let us emphasize the fact that no other formal devices than those that have just been introduced are allowed in LSs. Anything else we may want to add must be relevant to other components of the linguistic model, to the grammar for instance. Notice, however, that we do not exclude the need to add a measure of the relative &amp;quot;weight&amp;quot; of nodes and links. This measure, different from the trust value, would reflect the degree of activation of each LS element. For instance, the DiCo entry for</Paragraph>
  </Section>
  <Section position="6" start_page="52" end_page="52" type="metho">
    <SectionTitle>
DEFAITE
</SectionTitle>
    <Paragraph position="0"> 'defeat' lists quite a few support verbs that take this noun as complement, among which</Paragraph>
  </Section>
  <Section position="7" start_page="52" end_page="52" type="metho">
    <SectionTitle>
CONNAITRE
</SectionTitle>
    <Paragraph position="0"> 'to know' and</Paragraph>
  </Section>
  <Section position="8" start_page="52" end_page="52" type="metho">
    <SectionTitle>
SUBIR
</SectionTitle>
    <Paragraph position="0"> 'to suffer.' Weight values could indicate that the former verb is much less commonly used than the second in this context. Another advantage of weight is that it could help optimize navigation through the LS graph, when several paths can be taken.</Paragraph>
    <Paragraph position="1"> 3 Examples borrowed from the DiCo LS The DiCo is a French lexical database that focuses on the modeling of paradigmatic and syntagmatic lexical links controlled by lexical units. Paradigmatic links correspond to so-called semantic derivations (synonymy, antonymy, nominalization, verbalization, names for actants or typical circonstants, etc.). Syntagmatic links corresponds to collocations controlled by lexical units (intensifiers, support verbs, etc.). These lexical properties are encoded by means of a system of metalexical entities known as lexical functions .</Paragraph>
    <Paragraph position="2"> (For a presentation of the system of lexical functions, see Mel' c uk (1996) and Kahane and Polguere (2001).) Although it does not contain actual definitions, the DiCo partially describes the semantic content of each lexical unit with two formal tools: (i) a semantic label, that corresponds to the genus (core component) of the lexical unit's definition and (ii) a &amp;quot;propositional formula,&amp;quot; which states the predicative nature of the unit (non-predicative meaning or predicate with one, two or more arguments). Each entry also gives the government pattern (roughly, the subcategorization frame) of the unit and lists idioms (phrasal lexical units) that contain the unit under description. Finally, each entry contains a set of examples retrieved from corpora or the Internet. As one can see, the DiCo covers a fairly large range of lexical properties; for more information on the DiCo, one can refer to Polguere (2000) and Lareau (2002).</Paragraph>
    <Paragraph position="3"> Presently, the DiCo is developed as a File-</Paragraph>
    <Paragraph position="5"> database. Each DiCo entry corresponds to a record in the database, and the core of each record is the field that contains lexical function links controlled by the headword (i.e. the lexical unit described in the entry). Data in (1) below is one item in the lexical function field of the DiCo  encoding of the content of the lexical function application is for the benefit of users who do not master the system of lexical functions.</Paragraph>
    <Paragraph position="6"> *Following the name of the lexical function is the list of values of the lexical function application, each of which is a specific lexical entity. In this case, they are all collocates of the headword, due to the syntagmatic nature of Oper12 .</Paragraph>
    <Paragraph position="7"> * Finally, the expression between square brackets is the description of the syntactic structure controlled by the collocates. It corresponds to a special case of lexico-grammatical entities mentioned earlier in section 2.2. These entities have not been processed yet in our LS and they will be ignored in the discussion below.</Paragraph>
    <Paragraph position="8"> Data in (1) corresponds to a very small sub-graph in the generated LS, which is visualized in Figure 1 below. Notice that graphical representations we used here have been automatically generated in GraphML format from the LS and then displayed with the yEd graph editor/viewer.  This graph shows how DiCo data given in (1) have been modeled in terms of lexical entities and links. We see that lexical function applications are lexical entities: something to be communicated, that is pointing to actual means of expressing it. The argument ( arg link) of the lexical function application, the lexical unit</Paragraph>
  </Section>
  <Section position="9" start_page="52" end_page="55" type="metho">
    <SectionTitle>
RANCUNE
</SectionTitle>
    <Paragraph position="0"> , is of course also a lexical entity (although of a different nature). The same holds for the values</Paragraph>
    <Paragraph position="2"> links). None of these values, however, has been diagnosed as possessing a corresponding entry in the DiCo. Consequently, the compilation process has given them the (temporary) status of simple wordforms, with a trust value of  , visualized here by boxes with hashed borders. (Continuous lines for links or boxes indicate  ) appear as lexical entities in our LS, because of their very &amp;quot;abstract&amp;quot; nature. Two facts justify this approach. First, lexical units too are rather abstract entities. While wordforms horse and horses could be considered as more &amp;quot;concrete,&amp;quot; their grouping under a label HORSE lexical unit is not a trivial abstraction. Second, lexical functions are not only descriptive tools in Explanatory Combinatorial Lexicology. They are also conceptualized as generalization of lexical units that play an important role text production, in general rules of paraphrase for instance. This first illustration demonstrates how the LS version of the DiCo reflects its true relational nature, contrary to its original dictionary-like format as a FileMaker database. It also shows how varied lexical entities can be and how trust values can help keep track of the distinction between what has been explicitly stated by lexicographers and what can be inferred from what they stated. The next illustration will build on the first one and show how so-called non-standard lexical functions are integrated into the LS. Until now, we have been referring only to standard lexical functions, i.e. lexical functions that belong to the small universal core of lexical relations identified in Explanatory Combinatorial Lexicology (or, more generally, in Meaning-Text theory). However, all paradigmatic and syntagmatic links are not necessarily standard. Here is an illustration, borrowed from the DiCo entry for  has been introduced, because non-standard lexical functions are already explicit, non-formal encoding of lexical relations. The LS interpretation of (2) is therefore a simpler structure than the  Our last illustration will show how it is possible to project a hierarchical structuring on the DiCo LS when, and only when , it is needed.</Paragraph>
    <Paragraph position="3"> The hierarchy of semantic labels used to semantically characterize lexical units in the DiCo has been compiled into the DiCo LS together with the lexical database proper. Each semantic label is connected to its more generic label or labels (as this hierarchy allows for multiple inheritance) with an is_a link. Additionally, it is connected to the lexical units it labels by label links. It is thus possible to simply pull the hierarchy of semantic labels out of the LS and it will &amp;quot;fish out&amp;quot; all lexical units of the LS, hierarchically organized through hypernymy. Notice that this is different from extracting from the DiCo all lexical units that possess a specific semantic label: we extract all units whose semantic label belongs to a given subhierarchy in the system of semantic labels. Figure 3 below is the graphical result of pulling the accessoire ('accessory') subhierarchy.</Paragraph>
    <Paragraph position="4"> To avoid using labels on links, we have programmed the generation of this class of GraphML structures with links encoded as follows: is_a links (between semantic labels) appear as thick continuous arrows and label links (between semantic labels and lexical units they label) as thin dotted arrows.</Paragraph>
    <Paragraph position="5">  The &amp;quot;beauty&amp;quot; of LSs' structuring does not lie in the fact that it allows us to automatically generate fancy graphical representations. Such representations are just a convenient way to make explicit the internal structure of LSs. What really interests us is what can be done with LSs once we consider them from a functional perspective. The main functional advantage of LSs lies in the fact that these structures are both cannibal and prone to be cannibalized. Let us explain the two facets of this somehow gruesome metaphor.</Paragraph>
    <Paragraph position="6"> First, directed graphs are powerful structures that can encode virtually any kind of information and are particularly suited for lexical knowledge. If one believes that a lexicon is before all a relational entity, we can postulate that all information present in any form of dictionary and database can eventually be compiled into LS structures. The experiment we did in compiling the DiCo (see details in section 4) demonstrates well enough this property of LS structures.</Paragraph>
    <Paragraph position="7"> Second, because of their extreme simplicity, LS structures can conversely always be &amp;quot;digested&amp;quot; by other, more specific types of structures, such as XML versions of dictionary- or net-like databases. For instance, we have regenerated from our LS a DiCo in HTML format, with hyperlinks for entry cross-references and colorcoding for trust values of linguistic information. Interestingly, this HTML by-product of the LS  contains entries that do not exist in the original DiCo. They are produced for each value of lexical function applications that does not correspond to an entry in the DiCo. The content of these entries is made up of &amp;quot;inverse&amp;quot; lexical function relations: pointers to lexical function applications for which the lexical entity is a value. These new entries can be seen as rough drafts, that can be used by lexicographers to write new entries. We will provide more details of this at the end of the next section.</Paragraph>
  </Section>
  <Section position="10" start_page="55" end_page="56" type="metho">
    <SectionTitle>
4 Compiling the DiCo (dictionary-like)
</SectionTitle>
    <Paragraph position="0"> database into a lexical system The DiCo is presently available both in FileMaker format and as SQL tables, accessible through the DiCouebe interface.</Paragraph>
    <Paragraph position="1">  It is these tables that are used as input for the generation of LSs.  They present the advantage of being the result of an extensive processing of the DiCo that splits its content into elementary pieces of lexicographic information (Steinlin et al., 2005). It is therefore quite easy to analyze them further in order to perform a restructuring in terms of LS modeling. The task of inferring new information, information that is not explicitly encoded in the DiCo, is the delicate part of the compilation process, due to the richness of the database. Until now, we have only implemented a small subset of all inferences that can be made. For instance, we have inferred individual lexemes from idioms that appear inside DiCo records (COUP DE SOLEIL 'sunburn' entails the probable existence of the three lexemes COUP, DE and SOLEIL). We have also distinguished lexical entities that are actual lexical units from their signifiers (linguistic forms). Signifiers, which do not have to be associated with one specific meaning, play an important role when it comes to wading through an LS (for instance, when we want to separate word access through form and through meaning).</Paragraph>
    <Paragraph position="2"> We cannot give here all details of the compilation process. Suffice it to say that, at the present stage, some important information contained in the DiCo is not processed yet. For instance, we have not implemented the compilation of government patterns and lexicographic examples. On the other hand, all lexical function applications and the semantic labeling of lexical units are properly handled. Recall that we import together with the DiCo a hierarchy of semantic labels used by the DiCo lexicographers, which allows us to establish hypernymic links between lexical units, as shown in Figure 3 above.</Paragraph>
    <Paragraph position="3">  Codewise, the DiCo LS is just a flat Prolog database with clauses for only two predicates:  between sem. labels and lexical units; 1,690 &amp;quot;sense,&amp;quot; between vocables and lexical units corresponding to specific senses; 2,991 &amp;quot;basic_form,&amp;quot; between mono- or multilexical signifiers and vocables or lexical units; 6,464 &amp;quot;signifier,&amp;quot; between wordforms and monolexical signifiers; 4,135 &amp;quot;used_in,&amp;quot; between monolexical signifiers and multiliexical signifiers; 9,417 &amp;quot;lf,&amp;quot; between lexical functions and their application; 6,064 &amp;quot;gloss,&amp;quot; between lex. func. appl. and their gloss; 9,417 &amp;quot;arg,&amp;quot; between lex. func. appl. and their argument; 19,890 &amp;quot;value,&amp;quot; between lex. func. appl. and each of the value elements they return Let us make a few comments on these numbers in order to illustrate how the generation of the LS from the original DiCo database works.</Paragraph>
    <Paragraph position="4"> The FileMaker (or SQL) DiCo database that has been used contained only 775 lexical unit records (word senses). This is reflected in statistics by the number of sem_label links between semantic labels and lexical units: only lexical units that were headwords of DiCo records possess a semantic labeling. Statistics above show that the LS contains 1,690 lexical units. So where do the 915 (1,690 - 775) extra units come from? They all have been extrapolated from the so-called phraseology (ph) field of DiCo records, where lexicographers list idioms that are formally built from the record headword. For instance, the DiCo record for BARBE 'beard' contained (among others) a pointer to the idiom BARBE A PAPA 'cotton candy.' This idiom did not possess its own record in the original DiCo and has been &amp;quot;reified&amp;quot;  http://www.olst.umontreal.ca/dicouebe.</Paragraph>
    <Paragraph position="5">  The code for compiling the DiCo into an LS, generating GraphML exports and generating an HTML version of the DiCo has been written in SWI-Prolog.</Paragraph>
    <Paragraph position="6">  The hierarchy of semantic labels is developed with the Protege ontology editor. We use XML exports from Protege to inject this hierarchy inside the LS. This is another illustration of the cannibalistic (and not too choosy) nature of LSs.  while generating the LS, among 914 other idioms. null The &amp;quot;wordlist&amp;quot; of our LS is therefore much more developed than the wordlist of the DiCo it is derived from. This is particularly true if we include in it the 6,464 wordform entities. As explained earlier, it is possible to regenerate from the LS lexical descriptions for any lexical entity that is either a lexical unit or a wordform targeted by a lexical function application, filling word-form descriptions with inverse lexical function links. To test this, we have regenerated an entire DiCo in HTML format from the LS, with a total of 8,154 (1,690 + 6,464) lexical entries, stored as individual HTML pages. Pages for original DiCo headwords contain the hypertext specification of the original lexical function links, together with all inverse lexical links that have been found in the LS; pages for wordforms contain only inverse links. For instance, the page for METTRE 'to put' (which is not a headword in the original DiCo) contains 71 inverse links, such as:</Paragraph>
    <Paragraph position="8"> Of course, most of the entries that were not in the original DiCo are fairly poor and will require significant editing to be turned into bona fide DiCo descriptions. They are, however, a useful point of departure for lexicographers; additionally, the richer the DiCo will become, the more productive the LS will be in terms of automatic generation of draft descriptions.</Paragraph>
  </Section>
  <Section position="11" start_page="56" end_page="58" type="metho">
    <SectionTitle>
5 Lexical systems and multilinguality
</SectionTitle>
    <Paragraph position="0"> The approach to multilingual implementation of lexical resources that LSs allow is compatible with strategies used in known multilingual databases, such as Papillon (Serasset and Mangeot-Lerebours, 2001): it sees multilingual resources as connections of basically monolingual models.</Paragraph>
    <Paragraph position="1"> In this final section, we first argue for a monolingual perspective on the problem of multilinguality. We then make proposals for implementing interlingual connections by means of LSs.</Paragraph>
    <Section position="1" start_page="56" end_page="56" type="sub_section">
      <SectionTitle>
5.1 Theoretical and methodological pri-
</SectionTitle>
      <Paragraph position="0"> macy of monolingual structures We see two logical reasons why the issue of designing multilingual lexical databases should be tackled from a monolingual perspective.</Paragraph>
      <Paragraph position="1"> First, all natural languages can perfectly well be conceived of in complete isolation. In fact, monolingual speakers are no less &amp;quot;true&amp;quot; speakers of a language than multilingual speakers.</Paragraph>
      <Paragraph position="2"> Second, acquisition of multiple languages commonly takes place in situations where second languages are acquired as additions to an already mastered first language. Multiplicity in linguistic competence is naturally implemented by graft of a language on top of a preexisting linguistic knowledge. How multiple lexica are acquired and stored is a much debated issue (Schreuder and Weltens, 1993), which is outside the scope of our research. However, it is now commonly accepted that even children who are bilingual &amp;quot;from birth&amp;quot; develop two linguistic systems, each of which being quite similar in essence to linguistic systems of monolingual speakers (de Houwer, 1990). The main issue is thus one of systems' connectivity.</Paragraph>
      <Paragraph position="3"> From a theoretical and practical point of view, it is thus perfectly legitimate to see the problem of structuring multilingual resources as one of, first, finding the most adequate and interoperable structuring for monolingual resources. This being said, we do not believe that the issue of structuring monolingual databases has already been dealt with once and for all in a satisfactory manner. We hope the concept of LS we introduce here will stimulate reflection on that topic.</Paragraph>
    </Section>
    <Section position="2" start_page="56" end_page="58" type="sub_section">
      <SectionTitle>
5.2 Multilingual connections between LSs
</SectionTitle>
      <Paragraph position="0"> A multilingual lexical resource based on the LS architecture should be made up of several fully autonomous LSs, i.e., LSs that are not specially tailored for multilingual connections. They should function as independent modules that can be connected while preserving their integrity.</Paragraph>
      <Paragraph position="1"> Connections between LSs should be implemented as specialized interlingual links between equivalent lexical entities. There is one exception however: standard lexical functions (A1, Magn, AntiMagn, Oper1, etc.). Because they are universal lexical entities, they should be stored in a specialized interlingual module; as universals, they play a central role in interlingual connectivity (Fontenelle, 1997). However, these are only &amp;quot;pure&amp;quot; lexical functions. Lexical function appli- null We underline hypertext links. Lexical function applications listed here correspond French collocations that mean, respectively, to put in the background, to indict someone (literally in French 'to put someone in accusation'), to anchor a vessel (literally in French 'to put a vessel at the anchor'), to put someone in anguish, to keep something in a cupboard.  cations, such as Oper12(RANCUNE) above, are by no means universals and have to be connected to their counterpart in other languages. Let us examine briefly this aspect of the question.</Paragraph>
      <Paragraph position="2"> One has to distinguish at least two main cases of interlingual lexical connections in LSs: direct lexical connections and connections through lexical function applications.</Paragraph>
      <Paragraph position="3"> Direct connections, such as Fr. RANCUNE vs.</Paragraph>
      <Paragraph position="4"> Eng. RESENTMENT should be implemented-manually or using existing bilingual resources-as simple interlingual (i.e. intermodule) links between two lexical entities. Things are not always that simple though, due to the existence of partial or multiple interlingual connections. For instance, what interlingual link should originate from Eng. SIBLING if we want to point to a French counterpart? As there is no lexicalized French equivalent, we may be tempted to include in the French LS entities such as frere ou soeur ('brother or sister'). We have two strong objections to this. First, this complex entity will not be a proper translation in most contexts: one cannot translate He killed all his siblings by Il a tue tous ses freres ou soeurs--the conjunction et 'and' is required in this specific context, as well as in many others. Second, and this is more problematic, this approach would force us to enter in the French LS entities for translation purposes, which would transgress the original monolingual integrity of the system.</Paragraph>
      <Paragraph position="5">  We must admit that we do not have a ready-to-use solution to this problem, specially if we insist on ruling out the introduction of ad hoc periphrastic translations as lexical entities in target LSs. It may very well be the case that a cluster of interrelated LSs cannot be completely connected for translation purposes without the addition of &amp;quot;buffer&amp;quot; LSs that ensure full interlingual connectivity. For instance, the buffer French LS for English to French LS connection could contain phrasal lexical entities such as freres et soeurs ('siblings'), etre de memes parents and etre frere(s) et soeur(s) ('to be siblings'). This strategy can actually be very productive and can lead us to realize that what appeared first as an ad hoc solution may be fully justified from a linguistic perspective. Dealing with the sibling case, for instance, forced us to realized that while frere(s) et soeur(s) sounds very normal in French, soeur(s) et frere(s) will seem odd or, at least, intentionally built that way. This is a very strong argument for considering that a lexical entity (we do not say lexical unit!) frere(s) et soeur(s) does exist in French, independently from the translation problem that sibling poses to us.</Paragraph>
      <Paragraph position="6"> This phrasal entity should probably be present in any complete French LS.</Paragraph>
      <Paragraph position="7"> The case of connections through lexical function applications is even trickier. A simplistic approach would be to consider that it is sufficient to connect interlinguistically lexical function applications to get all resulting lexical connections for value elements. For standard lexical functions, this can be done automatically using the following strategy for two languages A and B.</Paragraph>
      <Paragraph position="8"> If the lexical entity L  ) by the &amp;quot;value&amp;quot; link should be connected by a &amp;quot;value translation&amp;quot; link, with a trust value of &amp;quot;0.5,&amp;quot; to all lexical entities linked to f(L B ) by a &amp;quot;value&amp;quot; link.</Paragraph>
      <Paragraph position="9"> The distinction between &amp;quot;translation&amp;quot; and &amp;quot;value translation&amp;quot; links allow for contextual interlingual connections: a lexical entity L' B could happen to be a proper translation of L' A only if it occurs as collocate in a specific collocation. But this is not enough. It is also necessary to filter &amp;quot;value translation&amp;quot; connections that are systematically generated using the above strategy. For instance, each of the specific values given in (1) section 3 should be associated with its closest equivalent among values of Oper12(RESENT-MENT): HAVE, FEEL, HARBOR, NOURISH, etc. At the present time, we do not see how this can be achieved automatically, unless we can make use of already available multilingual databases of collocations. For English and French, for instance, we plan to experiment in the near future with T. Fontenelle's database of English-French collocation pairs (Fontenelle, 1997). These collocations have been extracted from the Collins-Robert dictionary and manually indexed by means of lexical functions. We are convinced it is possible to use this database firstly to build a first version of a new English LS and, secondly, to implement the type fine-grained multilingual connections between lexical function values illustrated with our RANCUNE vs. RESENTMENT example.</Paragraph>
      <Paragraph position="10"> We are well aware that we have probably surfaced as many problems as we have offered solutions in this section. However, the above considerations show at least two things:  It is worth noticing that good English-French dictionaries, such as the Collins-Robert, offer several different translations in this particular case. Additionally, their translations do not apply to sibling as such, but rather to siblings or to expressions such as someone's siblings, to be siblings, etc.  * LSs have the merit to make explicit the scale of the problem of interlingual lexical correspondence, if one want to tackle this problem in a fine-grained manner;  * the implementation of multilingual connections over LSs should be approached using semi-automatic strategies.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML