File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1802_metho.xml

Size: 18,056 bytes

Last Modified: 2025-10-06 14:08:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1802">
  <Title>Conceptual Structuring through Term Variations</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Conceptual systems
</SectionTitle>
    <Paragraph position="0"> Terms are generally classified using partitive and generic relationships to be presented in a thesaural structure. But other relationships exist, the so-called complex relationships (Sager, 1990, pages 34-35) which are domain and application dependent. Examples of such complex relationships are: FALLOUT is caused by NUCLEAR EXPLOSION COAL-MINE is a place for COAL-MINING Studying term formation, (Kageura, 2002) introduces intra-term relationships dealing with complex terms and defining the role of the determinant with respect to the head noun. In computer program, program is the head noun and computer the determinant. As program is intended for computer, a &amp;quot;destination&amp;quot; intra-relationship occurs between program and computer. (L'Homme, 2002) used lexical functions to represent various types of relationships via an unique formalism for a computerized version of a dictionary. These relationships are of several types: a0 paradigmatic such as the generic (Gener) or antonymy (Anti) relationships: Gener(retail sale) = sale, Anti(retail sale) = wholesaling; a0 derivational such as nominalizations (S</Paragraph>
    <Paragraph position="2"> These conceptual relationships are assigned manually to terms or sets of terms. We propose to automatically assign conceptual relationships to complex terms through their variations.</Paragraph>
    <Paragraph position="3">  ion) prot'eine de poissons (fish protein), chimioprophylaxie au rifampine (rifampicin chemoprophylaxis) null Noun1 `a Vinf viandes `a griller (grill meat) These base structures are not frozen structures and accept variations. Terminological variation in texts is now a well-known phenomenon estimated from 15 % to 35 %, depending on the domain reflecting by the texts and the different kinds of variants handled. For acquisition, it is essential to identify extensively all the concepts represented by terms in textual data. Thus, only term variants which can preserve the base-term semantics and thus refer to the same concept are taken into account in a first step. Two sequences such as histamine pr'esente dans le vin (histamine which is present in wine) et histamine du vin (histamine of the wine) refer to the same term histamine du vin (wine histamine); but, the sequences produit `a surgeler (product to be frozen) and produit surgel'e (frozen product) refer to two different terms linked by an aspectual relationship.</Paragraph>
    <Paragraph position="4"> We present now a linguistic typology of base-term variations for French: Graphical case differences and presence of a optional hyphen inside the Noun1 Noun2 structure.</Paragraph>
    <Paragraph position="5"> Inflexional orthographic variants gathering together inflexional variants that are predictable such as conservations de produit (product preservations) or unpredictable such as conservation de produits (product preservation).</Paragraph>
    <Paragraph position="6"> Shallow syntactic The shallow syntactic variations modify the function words of the base-terms. There are three kinds of internal syntactic variations: is-1 variations of the preposition: chromatographie en colonne (column chromatography) a4 chromatographie sur colonne (chromatography on column); is-2 optional character of the preposition and of the article: fixation azote (nitrogen fixation) a5 fixation d'azote (fixation of nitrogen) a6 fixation de l'azote (fixation of the nitrogen); is-3 predicative variants: the predicative role of the adjective: pectine m'ethyl'ee (methylate pectin) a6 ces pectines sont m'ethyl'ees (these pectins are methylated).</Paragraph>
    <Paragraph position="7"> Syntactic The shallow syntactic variations modify the internal structure of the base-terms: S-1 Internal modification variants: insertion inside the base-term structure of a0 a modifier such as the adjective inside the Noun1 Prep Noun2 structure: lait de brebis (goat's milk), lait cru de brebis (milk straight from the goat); a0 a nominal specifier inside the Noun Adj.</Paragraph>
    <Paragraph position="8"> These specifiers belongs to a closed list of nouns such as type, origine, couleur (colour): prot'eine v'eg'etale &amp;quot;vegetable protein&amp;quot; a6 prot'eine d'origine v'eg'etale &amp;quot;protein of vegetable origin&amp;quot;.</Paragraph>
    <Paragraph position="9"> S-2 Coordinational variants: head or expansion co-ordination of base term structures and enumeration: null analyse de particules &amp;quot;particule analysis&amp;quot; a6 analyse et le tri de particules &amp;quot;particle sort and analysis&amp;quot; alimentation humaine &amp;quot;human feeding&amp;quot; a6 alimentation animale et humaine &amp;quot;human and animal feeding&amp;quot;.</Paragraph>
    <Paragraph position="10"> Morphosyntactic The Morphosyntactic variations modify the internal structure of the base-terms and its components are liable to morphological modification (including derivation).</Paragraph>
    <Paragraph position="11"> M-1 Morphology : the preposition inside a candidate term of Noun1 Prep Noun2 structure is equivalent to a prefix applying on Noun2: pourrissement apr`es r'ecolte (rot after harvest) a4 pourrissement post-r'ecolte (post-harvesting rot); M-2 Derivational morphology: a derivational variation that keeps the synonymy of the base term implies a relational adjective: acidit'e du sang (acidity of the blood) a4 acidit'e sanguine (blood acidity). This morphosyntactic variation could be associated with a syntactic variation: the sequence: alimentation destin'ee `a l'homme et `a l'animal &amp;quot;food destined to man and to animal&amp;quot; is a variation of the base-term: alimentation animale &amp;quot;animal food&amp;quot;.</Paragraph>
    <Paragraph position="12"> Two other types of variation could have been included in this typology: paradigmatic and anaphorical variations. The first one relies on the substitution principle of distributional linguistics (Harris, 1968). One or two words of the base-term could be substituted by one of their synonyms without modifying the syntactic structure (Hamon and Nazarenko, 2001). The second one gathers elliptical anaphora and acronyms.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Variations reflecting conceptual
</SectionTitle>
      <Paragraph position="0"> relationships All these variations are those which could preserve synonymy with the base term. They can, of course, include semantic discrepancies and can refer either to two base terms or to a base term and a conceptually linked term. Thus, two different prepositions lead to two base terms: transmission par satellite (satellite transmission) a0a1 transmission entre satellites (transmission between satellites) and internal modification (see variation S-1a) refers to a overcomposed term: huile essentielle de sapin (fir essence) is a hyponym of huile essentielle (essence) and not a variation of huile de sapin (fir oil).</Paragraph>
      <Paragraph position="1"> We propose to identify the conceptual relationships betwen base terms through syntactic or morphological clues. We use standard lexical functions to express the conceptual relationships. When there does not exist a lexical function to label a conceptual relationship, we introduce a new lexical function (i.e a non standard one). Standard lexical function are written in lower-case, non-standard in upper-case.</Paragraph>
      <Paragraph position="2"> Syntactic The internal modification of the base structures mainly implies two types of semantic relationships: null a0 Hyperonymy: if it is a relational adjective that modifies the base term of N1 Adj or N1 Prep (Det) N2 structure, an hyperonymic relationship occurs between the base term and the modified one. The lexical function that captures hyperonymic relationships is the function Spec introduced by (Grimes, 1990): Spec (contraction isom'etrique &amp;quot;isomet- null ric contraction&amp;quot;) = contraction musculaire isom'etrique &amp;quot;isometric muscular contraction&amp;quot; Spec (agent bact'erien &amp;quot;bacterial agent&amp;quot;) = agent infectieux bact'erien &amp;quot;bacterial infectious agent&amp;quot; a0 Antonymy: if it is an adverb of negation that  modifies the base term of N1 Adj structure, an antonymic relationship occurs between the base term and the modified one. This relationship of opposition is described with the function Anti: Anti(levure floculante &amp;quot;flocculating yeast&amp;quot;)= levure non floculante &amp;quot;non-flocculating yeast&amp;quot; Morphosyntactic Semantic distinctions appear with base terms that are morphologically related to other base terms. Two base-terms a0 a2a1a0a3a2 and a0</Paragraph>
      <Paragraph position="4"> are considered as morphologically-related if one of the three following constraints are satisfied:</Paragraph>
      <Paragraph position="6"> are head nouns and are identical.</Paragraph>
      <Paragraph position="8"> are expansions and are semantically related by the use of an affix; ii. a0 a2 and a0</Paragraph>
      <Paragraph position="10"> are head nouns and are semantically related by the use of an affix. a0a5a2 and a0</Paragraph>
      <Paragraph position="12"> are semantically related by the use of a suffix such as preserved food/food preservation; null Some affixes that have been studied for French by (Corbin, 1987) provide clues to characterize the semantic link occurring between two morphologically-related candidate terms.</Paragraph>
      <Paragraph position="13"> a0 Antonymy: the prefixes ir, d'e, non(-) applying either on the head or expansion element on a base term whatever is its structure characterize an antonymic relationship. Examples are:  head noun of base term attest of a &amp;quot;set of&amp;quot; relationship expressed with the function Mult: Mult (plume de canard &amp;quot;duck feather&amp;quot;) = plumage de canards &amp;quot;duck feather&amp;quot; The two base-terms share the same pattern.</Paragraph>
      <Paragraph position="14"> a0 Result: A &amp;quot;result&amp;quot; relationship is expressed with the function Na7a9a8a11a10 applying on nouns. This relation is induced either by: - the suffixes age, ade, erie applying on the head noun of base terms: Na7a9a8a12a10 (plumage de canards &amp;quot;duck feather&amp;quot;) = plume de canard &amp;quot;duck feather&amp;quot; Na7a9a8a12a10 (filetage du saumon &amp;quot;salmon filleting&amp;quot;) = filet de saumon &amp;quot;salmon fillet&amp;quot;; The two base-terms share the same pattern. null - or by the suffixes age, ade, erie, ment, tion, ure associated with an inversion. We distinguish two cases: a13 if this morphological link involves a N Adj structure, the function Na7a9a8a12a10 applies: null</Paragraph>
      <Paragraph position="16"> standard function where the term of N `a Vinf structure expresses the state before the process. Thus, we introduce the new function Na14a16a15a18a17a20a19 :</Paragraph>
      <Paragraph position="18"> &amp;quot;food to preserve&amp;quot;; a0 Actor: the suffixe eur applying on the head noun of a base term builds its actant expressed with the function S a2 : S a2 (transport routier &amp;quot;road transport&amp;quot; = transporteur routier &amp;quot;road haulier&amp;quot;. The two base-terms share the same pattern.</Paragraph>
      <Paragraph position="19"> Other semantic relationships involving two base-terms with the same pattern are induced by prefixes. For those, we have to introduce new functions as:</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Linguistic structuring
</SectionTitle>
      <Paragraph position="0"> The term extractor program takes as input a tagged and lemmatized corpus. The programme implements shallow parsing and morphological conflating. First, it scans the corpus, counts and extracts strings whose syntax characterizes base-terms or one of their variants. This collecting step uses local grammars based on regular expressions (Abney, 1997). These grammars use the morphosyntactic information associated with the words of the corpus by the tagger. The different occurrences referring to a base term or one of its variants are grouped as a pair formed by lemmas of the candidate base term. Second, morphological analysis is performed to confluate synomymic derivational variants of base terms such as acidit'e du sang (acidity of the blood) a4 acidit'e sanguine (blood acidity). Stripping-recoding morphological rules adopt the following rule schemata:</Paragraph>
      <Paragraph position="2"> where: S is the relational suffix to be deleted from the end of an adjective. The result of this deletion is the stem R; M is the mutative segment to be concatenated to R in order to form a noun.</Paragraph>
      <Paragraph position="3"> For example, the rule [ -'e +e ] says that if there is an adjective which ends with 'e, we should strip this ending from it and append the string e to the stem. The algorithm below resumes the successive steps for identifying relational adjectives:  1. Examine each candidate of Noun Adj structure; 2. Apply a transformational rule in order to generate all the possible corresponding base nouns. 3. Search the set of candidate terms for a pair formed with Noun1 (identical between a Noun1 (Prep (Det)) Noun2 and a Noun1 Adj structures) and Noun2 generated from step 2. 4. If step 3 succeeds, group the two base struc null tures under an unique candidate term.</Paragraph>
      <Paragraph position="4"> In Step 2, morphological rules generate one or several nouns for a given adjective. We generate a noun for each relational suffix class. A class of suffixes includes the allomorphic variants. This overgeneration method used in information retrieval by (Jacquemin, 2001) gives low noise because the base noun must not only be an attested form in the corpus, but must also appear as an extension of a head noun.</Paragraph>
      <Paragraph position="5"> At the end of the linguistic processing, the term extractor proposes as output: 1. a list of pilot terms ranked from the most representative of the corpus to the least thanks to the Loglikelihood coefficient introduced by (Dunning, 1993).</Paragraph>
      <Paragraph position="6"> 2. for each pilot term, a XML structure is provided which gathers all the base structures and the variations encountered.</Paragraph>
      <Paragraph position="7"> An example of such data is given in figure in Table 1.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Conceptual structuring
</SectionTitle>
      <Paragraph position="0"> The conceptual structuring takes as input the data provided by the first step. First, we present the methodology employed to exploit variables of base terms. We then demonstrate the labelling of conceptual links through morphological analysis.</Paragraph>
      <Paragraph position="1">  In the previous step, a first list of relational adjectives has been established thanks to their paraphrasic property. (Daille, 2001) demonstrated that candidate terms of N Adj structure where Adj is relational hold a more important naming potential than for the synonym form in N1 Prep N2. The absence of paraphrases, the non-paraphrasability, or a complex paraphrasability or a large derivational distance between the adjective and the noun do not allow exhaustive identification. We extend this list by exploiting the coordination variations of N Adj base terms. Indeed, a relational adjective holds the prop-erty to coordinate only with other relational adjectives. To summarise: 1. From industrie de l'alimentation &amp;quot;food industry&amp;quot; and industrie alimentaire &amp;quot;food industry&amp;quot;,  we deduce that alimentaire is a relational adjective; null 2. From the coordinational variant produit agricole et alimentaire &amp;quot;farm and food product&amp;quot;, we deduce that agricole is a relational adjective. null This classic learning algorithm that is normally bound by the number of adjectives in the corpus converges in five steps. It allows to extend the set of relational adjectives from 143 to 239. The following are some examples of acquired relational adjectives:  hydrodynamique 3 ( thermique ) Using this extended list of relational adjectives, we automatically check all the modification variants of collected base-terms: a0 if a relational adjective is present, we infer an hyperonymy link between the variant and the base term as for contraction isom'etrique &amp;quot;isometric contraction&amp;quot; and contraction musculaire isom'etrique &amp;quot;isometric muscular contraction&amp;quot;, but not for organisation ordonn'ee des mol'ecules &amp;quot;ordered molecule organization&amp;quot; that remains a syntactic variation of organisation mol'eculaire &amp;quot;molecule organization&amp;quot;; a0 if an adverb of negation is present, we infer an antonymy link between the variant and the base term as for brunissement non enzymatique &amp;quot;non enzymatic browning&amp;quot; and brunissement enzymatique &amp;quot;enzymatic browning&amp;quot;.</Paragraph>
      <Paragraph position="2">  To identify the conceptual relationships denoted by derivational links, we perform a morphological analysis using the same method as in section 4.1: we wrote stripping-recoding morphological rules for each conceptual relationship, we apply the overgeneration method and the filtering based on the presence or not of the generated base term candidates. In order to browse the list of candidate terms, we apply to each candidate terms successively all the possible derivations.</Paragraph>
      <Paragraph position="3"> The output of the conceptual structuring program is a list of candidate terms ranked, each of them representing a set of conceptually linked candidate terms. An example of such structure is given in Table 2.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML