File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1082_intro.xml
Size: 8,402 bytes
Last Modified: 2025-10-06 14:06:32
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1082"> <Title>A step towards the detection of semantic variants of terms in technical documents</Title> <Section position="3" start_page="500" end_page="502" type="intro"> <SectionTitle> 3 Results and study of the detected </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="500" end_page="500" type="sub_section"> <SectionTitle> links 3.1 Various detected links </SectionTitle> <Paragraph position="0"> Synonymy links 396 links between complex candidate terms (i.e. noun phrases) are inferred by this method. An expert of the domain validated 37% of them (i.e. 146 links, cf. table 2) as real synonymy links: hauteur d'eau (water height) / niveau d'eau (level of water), d~t~ri- null Most of the synonymy links between candidate terms are detected at the first iteration (383 liens out of 396). The majority of the validated links are given by the two first rules: 89 validated links out of 206 with the first rule (admission d'air (air intake) / entrde d'air (air entry)), 49 out of 105 with the second (toitflottant (floating roof) / toil mobile (movable roof) and collecteur gdndral (general collector) / colleeteur commun (common collector)). Obviously, the last rule has a lower precision rate: 8 out of 85 (fausse manoeuvre (wrong operation) / mauvalse manipulation (bad handling)). However, it infers important links which are difficult to detect by hand.</Paragraph> <Paragraph position="1"> Other useful links On the whole, the expert judged that half of the detected links are useful for the terminology structuration even if he rejected some of them as real synonymy links (cf. figure 5). Our method detects different types of links: meronymy, antonymy, relations between close concepts, connected parts of a whole mechanism, etc.</Paragraph> <Paragraph position="2"> The meronymy links are the most numerous after synonymy (rapport de s~retd (safety report) / analyse de s~retd (safety analysis)). In the previous example, whereas rapport (report) and analyse (analysis) are given as synonyms by the general language dictionary (which is contextfree), their technical meanings in our document are more specific. Therefore, rapport de s~retd is a meronym rather than a synonym of analyse de s~retd in the studied document.</Paragraph> <Paragraph position="3"> Other detected links allow to group the candidate terms which refer to related concepts.</Paragraph> <Paragraph position="4"> For instance, we detected a link between the device ligne de vidange (draining line) and the place point de purge (blow-down point) which is relevant since a draining line ends at a blow-down point. Likewise, it is useful to link fin de vidange (draining end) which designates an operation and destination des purges (blow-down destination) which is the corresponding equipment. null The expert considered that the link between the candidate terms (commande mdcanique (mechanical control) / ordre automatique (automatic order)) expresses an antonymy relation, although it is infered from the synonymy relation of the dictionary mdeanique (mechanical) / automatique (automatic). It appears that those adjectives have a particular meaning in the present corpus. Therefore, every link detected from this &quot;synonymy&quot; link is an antonymy one.</Paragraph> <Paragraph position="5"> Those links express various relations sometimes difficult to name, even by the expert.</Paragraph> <Paragraph position="6"> Such links are important in a terminology.</Paragraph> </Section> <Section position="2" start_page="500" end_page="502" type="sub_section"> <SectionTitle> 3.2 Polysemy, elision and metaphor </SectionTitle> <Paragraph position="0"> Most real errors are due to the lack of context information for polysemic words and the noisy data existing in the dictionary. For instance the French word temps means either time or weather. According to the dictionary, temps (weather) is a synonym of temperature (temperature) 2, but this meaning is excluded from the present corpus. Since we cannot distinguish the different meanings, the synonymy of temps / time and temperature is taken for granted. Temps attendu (expected time) and tempdrature attentive (expected tempera2 It would be more precise to interpret it as analogous words.</Paragraph> <Paragraph position="1"> Term 1 Term 2 essai en usine (test in plant) ligne de vidange (draining line) fonction d'un temps (fonction of a time) froid normal (normal cold) rapport de sfiret~ (safety report) solution d'acide borique (solution of boric acid) temperature attendue (expected temperature) temperature normale (normal temperature) organes de commande (control devices) gros d~bit (big flow) activit~ importante (important activity) commande m~canique (mechanical control) risques de corrosion (risk of corrosion) experience d'exploitation (experiment of exploitation) point de purge (blow-down point) effet d'une temperature (effect of a temperature) refroidissement correct (correct cooling) analyse de sfiret~ (safety analysis) dissolution de l'acide borique (dissolving of the boric acid) temps attendu (expected time) temps normal (normal time) organes d'ordre (order devices) plein d~bit (full flow) activit~ ~lev~e (high activity) ordre automatique (automatic order) risques de destruction (risk of destruction) of wrong links is rather important in the list presented to the expert: between 10 to 20 links out of 396.</Paragraph> <Paragraph position="2"> On the contrary, about ten wrong links are due to the elision of common terms in the domain. For instance, the term activitd (activity) which actually corresponds to the term radioactivitd (radioactivity) in the document is given as a synonym of gnergie (energy) in the dictionary. between complex candidate terms.</Paragraph> <Paragraph position="3"> We have detected links such as activitd haute (high activity) / haute dnergie (high energy). As regards metaphor, we have observed that it preserves semantic relation. For instance, in graph theory, the link (arbre (tree) / feuille (leaf)) can be inferred from the meronyny information of general dictionary.</Paragraph> <Paragraph position="4"> Those types of wrong links are easily identified during the validation. Some exceptions rules can be designed to first regroup those links and then eliminate them. With that aim, we plan to use dictionary definitions.</Paragraph> </Section> <Section position="3" start_page="502" end_page="502" type="sub_section"> <SectionTitle> 3.3 Evaluation </SectionTitle> <Paragraph position="0"> The inferred links express not only synonymy, but also other relations which may be difficult to name. Apart from real errors, these fuzzy see-also relations are useful in the context of a consultation system.</Paragraph> <Paragraph position="1"> The results of this first experiment are encouraging. Although the precision rate and the number of links are low (37%, 396 links), the use of complementary methods (e.g. detection of syntactic variants) would allow to propagate these links and increase their number. Also, the use of other knowledge sources or different methods (Habert et al., 1996) is necessary to increase precision rate and find links between more technical candidate terms.</Paragraph> <Paragraph position="2"> As regards the improvement of such a method, the terminology acquisition by an expert will take tens of hours while the automatic extraction takes one hour and the validation of the links has been done in two hours.</Paragraph> <Paragraph position="3"> The main difficulty is to evaluate the recall in the results because there is no standard reference in that matter, giving the overall relevant relations in the document. One may think that the comparison with links manually detected by an expert is the best evaluation, but such manual detection is subjective. Regarding the validation by several experts, it is well-known that such validation would give different results depending on the background of each expert (Szpakowicz et al., 1996). So, we are reduced to compare our results with those obtained by different methods even though they are not perfect either. We are planning to compare the clusters found by our method with the clustering one of (Assadi, 1997) to study how the results overlap and are complementary.</Paragraph> </Section> </Section> class="xml-element"></Paper>