File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1044_metho.xml
Size: 14,736 bytes
Last Modified: 2025-10-06 14:15:27
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1044"> <Title>Syntagmatic and Paradigmatic Representations of Term Variation</Title> <Section position="3" start_page="0" end_page="341" type="metho"> <SectionTitle> 2 Term Variation: Representation </SectionTitle> <Paragraph position="0"> and Exploitation Terms and variations are represented into two parallel frameworks illustrated by Figure 1. While terms are described by a unique pair composed of a structure--at the syntagmatic level--and a set of lexical items--at the paradigmatic level--, a variation is represented by a pair of such pairs: one of them is the source term (or normalized term) and the other one is the target term (or variant).</Paragraph> <Paragraph position="1"> The syntagmatic description of a term is a context free rule; it is complemented with lexical information embedded in a feature structure denoted by constraints between paths and values. For instance, the term speed measurement is represented by:</Paragraph> <Paragraph position="3"> This term is a noun phrase composed of a head noun N1 and a modifier N2; the lemmas are given by the constraints at the paradigmatic level. This framework is similar to the unification-based representation of context-free grammars of (Shieber, 1992). At the syntagmatic level, variations are represented by a source and a target structure. At the paradigmatic level, the lexical elements of variations are not instantiated in order to ensure higher generality. Instead, links between lexical elements are provided. They denote morphological and/or semantic relations between lexical items in the source and target structures of the variation. For example, the variation that associates a Noun-Noun term such as the preceding term speedN= measurementN1 with a verbal formof the head word and a synonym of the argument such as measuringvl maximaIh shorten-</Paragraph> <Paragraph position="5"> If this variation is instantiated with the term given in (1), it recognizes the lexico-syntactic structure Vl (Prep ? Det ? (AINIPart)*) N~ (3) in which V1 and measurement are morphologically related, and N~ and speed are semantically related.</Paragraph> <Paragraph position="6"> The target structure is under-specified in order to describe several possible instantiations with a single expression and is therefore called a candidate variation. In this example, a regular expression is used to under-specify the structure2; another solution would be to use quasi-trees with extended dependencies (Vijay-Shanker, 1992).</Paragraph> </Section> <Section position="4" start_page="341" end_page="342" type="metho"> <SectionTitle> 3 Paradigmatic relations </SectionTitle> <Paragraph position="0"> As illustrated by Figure 2 and Formula (2), there are two types of paradigmatic relations between lemmas 2A stands for adjective, N for noun, Prep for preposition, V for verb, Det for determiner, Part for participle, and Adv for adverb.</Paragraph> <Paragraph position="1"> involved in the definition of term variations: morphological and semantic relations. The morphological family of a lemma l is denoted by the set FM(l) and its semantic family by the set FSL (l) or Fsc (l).</Paragraph> <Paragraph position="2"> Roughly speaking, two words are morphologically related if and only if they share the same root. In the preceding example, to measure and measurement are in the same morphological family because their common root is to measure. Let/: be the set of lemmas, morphological roots define a binary relation M from PS to/: that associates each lemma with its root(s): M E PS ~ PS. M is not a function because compound lemmas have more than one root.</Paragraph> <Paragraph position="3"> The morphological family FM(l) of a lemma 1 is the set of lemmas (including l) which share a common root with l:</Paragraph> <Paragraph position="5"> (liD(/:) is the power-set of PS:, the set of its subsets.) There are principally two types of semantic relations: direct links through a binary relation SL E /2 ~ PS: or classes C E ~(l?(/:)).</Paragraph> <Paragraph position="6"> In the case of semantic links, the semantic family Fs~ (l) of a lemma 1 is the set of lemmas (including l) which are linked to l:</Paragraph> <Paragraph position="8"> In the case of semantic classes, the semantic family Fsc (l) of a lemma l is the union of all the classes to which it belongs:</Paragraph> <Paragraph position="10"> Links and classes are equivalent, the choice of either model depends on the type of available semantic data. In the experiments reported here, direct links are used to represent data extracted from the word processor Microsoft Word97 because they are provided as lists of synonyms associated with each lemma. Conversely, the synsets extracted from WordNet 1.6 (Fellbaum, 1998) are classes of disambiguated lemmas and, therefore, correspond to the second technique.</Paragraph> <Paragraph position="11"> With respect to the definitions of semantic and morphological families given in this section, the candidate variant (3) is such that V1 * FM(measurement) and N~ * FSL(speed) or N~ * Fsc (speed).</Paragraph> <Paragraph position="12"> 4 Morphological and Semantic</Paragraph> <Section position="1" start_page="342" end_page="342" type="sub_section"> <SectionTitle> Families </SectionTitle> <Paragraph position="0"> In the experiments on the English corpora, the CELEX database is used to calculate morphological families. As for semantic families, either Word-Net 1.6 or the thesaurus of Microsoft Word97 are used.</Paragraph> <Paragraph position="1"> Morphological Links from CELEX In the CELEX morphological database (CELEX, 1998), each lemma is associated with a morphological structure that contains one or more root lemmas. These roots are used to calculate morphological families according to Formula (4). For example, the morphological family FM(measurementN) of the lemmas with measurev as root word is { commensurable A , commensurably Adv , countermeasureN, immeasurableA, immeasurablyAdv, incommensurableA, measurableA, measurablyAdv, measureN , measureless A , measurementN , mensurable A , tape-measureN, yard-measureN , measurev }.</Paragraph> <Paragraph position="2"> Semantic Classes from WordNet Two sources of semantic knowledge are used for the English language: the WordNet 1.6 thesaurus and the thesaurus of the word processor Microsoft Word97. In the WordNet thesaurus, disambiguated words are grouped into sets of synonyms--called synsets--that can be used for a class-based approach to semantic relations. For example, each of the five disambiguated meanings of the polysemous noun speed belongs to a different synset. In our approach, words are not disambiguated and, therefore, the semantic family of speed is calculated as the union of the synsets in which one of its senses is included. Through Formula (6), the semantic family of speed based on WordNet is: Fsc (speedN) = {speedN, speedingN, hurryingN, hasteningN, swiftnessN, fastnessN, velocityN, amphetamineN }.</Paragraph> <Paragraph position="3"> Semantic Links from Microsoft Word97 For assisting document edition, the word processor Microsoft Word97 has a command that returns the synonyms of a selected word. We have used this facility to build lists of synonyms. For example, FSn ( speed N ) = { speedN , swi\]tnesss, velocityN , quicknessN , rapidityN , accelerationN , alacrityN , celerityN} (Formula (5)). Eight other synonyms of the word speed are provided by Word97, but they are not included in this semantic family because they are not categorized as nouns in CELEX.</Paragraph> </Section> </Section> <Section position="5" start_page="342" end_page="344" type="metho"> <SectionTitle> 5 Variations </SectionTitle> <Paragraph position="0"> The linguistic transformations for the English language presented in this section are somehow simplified for the sake of conciseness. First, we focus on binary terms that represent 91.3% of the occurrences of multi-word terms in the English corpus \[MEDIC\].</Paragraph> <Paragraph position="1"> Then, simplifications in the combinations of types of variations are motivated by corpus explorations in order to focus on the most productive families of variations.</Paragraph> <Paragraph position="2"> The 3 Dimensions of Linguistic Variations There are as many types of morphological relations as pairs of syntactic categories of content words. Since the syntactic categories of content words are noun (N), verb (V), adjective (A), and adverb (Adv), there are potentially sixteen different pairs of morphological links. (Associations of identical categories must be taken into consideration. For example, Noun-Noun associations correspond to morphological links between substantive nouns such as agent/process: promoter~promotion.) Morphological relations are further divided into simple relations if they associate two words in the same position and crossed relations if they associate a head word and an argument. Combining categories and positions, there are, in all, 64 different types of morphological relations.</Paragraph> <Paragraph position="3"> In (Hamon et al., 1998), three types of semantic relations are studied: a link between the two head words, a link between the two arguments, or two parallel links between heads and arguments. These authors report that double links are rare and that their quality is low. They only represent 5% of the semantic variations on a French corpus and they are extracted with a precision of 9% only. We will therefore focus on single semantic links. Since we are only concerned with synonyms, only two types of semantic links are studied: synonymous heads or synonymous arguments.</Paragraph> <Paragraph position="4"> The last dimension of term variability is the structural transformation at the syntagmatic level. The source structure of the variation must match a term structure. There are basically two structures of binary terms: X1 N2 compounds in which X1 is a noun, an adjective or a participle, and N1 Prep N~ terms. According to (Jacquemin et al., 1997), there are three types of syntactic variations in French: coordinations (Coot), insertions of modifiers (Modif), and compounding/decompounding (Comp). Each of these syntactic variations is further subdivided into finer categories.</Paragraph> <Section position="1" start_page="343" end_page="343" type="sub_section"> <SectionTitle> Multi-dimensional Linguistic Variations </SectionTitle> <Paragraph position="0"> The overall picture of term variations is obtained by combining the 64 types of morphological relations, the two types of semantic relations and the three types of syntactic variations (and their sub-types).</Paragraph> <Paragraph position="1"> There are different constraints on these combinations that limit the number of possible variations: 1. Morphological and semantic links must operate on different words. For example, if the head word is transformed by a morphological link, the only word available for a semantic link is the argument word.</Paragraph> <Paragraph position="2"> 2. The target syntactic structure must be com- null patible with the morphological transformations. For example, if a noun is transformed into a verb, the target structure must be a verb phrase.</Paragraph> <Paragraph position="3"> These two constraints influence the way in which a variation can be defined by combining different types of elementary modifications. Firstly, lexical relations are defined at the paradigmatic level: morphological links, semantic links or identical words. Then a syntactic structure that is compatible with the categories of the target words is chosen.</Paragraph> <Paragraph position="4"> The list of variations used for binary compound terms in English is given in Table 1. 3 It has been experimentally refined through a progressive corpus-based tuning. The Synt column gives the target syntactic structure. The Morph column describes 3punctuations are noted Pu and coordinating conjunction CC.</Paragraph> <Paragraph position="5"> the morphological link: a source and a target syntactic category and the syntactic positions of the source and target lemmas. The Sere column indicates whether the variation involves a semantic link and the position of the lemmas concerned by the link (both lemmas must have an identical position). The Pattern column gives the target syntactic structure as a function of the source structure which is either X1N2, A1N2, or N1N2.</Paragraph> <Paragraph position="6"> For example, Variation #42 transforms an</Paragraph> <Paragraph position="8"> N1 is a noun in the morphological family of A1 (noted FM(A1)N) and N~ is semantically related with N2 (noted Fs(N2)). This variation recognizes malignancy in orbital turnouts as a variant of malignant tumor because malignancy and malignant are morphologically related, turnout and tumor are semantically related, and malignancyN inprep orbitaIA tumoursN matches the target pattern. Variation #56 is a more elaborated version of variation (2) given in Section 2.</Paragraph> <Paragraph position="9"> any morphological link. They are built as follows. Firstly, the different structures of noun phrases are used as target structures. Twelve structures are proposed: head coordination (#1), argument coordination (#4), enumeration with conjunction (#7), enumeration without conjunction (#10), etc.</Paragraph> <Paragraph position="10"> Then each transformation is enriched with additional semantic links between the head words or between the argument words. Semantic links between argument words are found in variations #(3n + 2)o<n<ll and between head words in variations #(3n)l<n<12. (Due to the lack of space, only variations #2 and #3 constructed on top of variation #1 are shown in Table 1.) Sample variants from \[MEDIC\] for the first 36 variations are given in Table 2. Some variations have not matched any variant in the whole corpus.</Paragraph> </Section> <Section position="2" start_page="343" end_page="344" type="sub_section"> <SectionTitle> Sample Morpho-syntactico-semantic Variants </SectionTitle> <Paragraph position="0"> Morpho-syntactico-semantic variations are numbered #37 to #62 in Table 1. Only 10 of the 64 possible morphological associations are found in the list of morphological links: Noun to Adjective on arguments (#37), Adjective to Noun on arguments (#39), etc. Each of these variations is doubled by adding a semantic link between the words that are not morphologically related. For example, variation (#40) is deduced from variation (#39) by adding a semantic link between the head words. Sample variants are given in Table 3.</Paragraph> <Paragraph position="2"/> </Section> </Section> class="xml-element"></Paper>