File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/98/p98-1082_relat.xml
Size: 3,012 bytes
Last Modified: 2025-10-06 14:16:04
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1082"> <Title>A step towards the detection of semantic variants of terms in technical documents</Title> <Section position="4" start_page="502" end_page="503" type="relat"> <SectionTitle> 4 Related works </SectionTitle> <Paragraph position="0"> The variant detection in specialized corpora must be taken into account for information retrieval. This complex operation involves the semantic as well as the morphological and syntactic level. (Jacquemin, 1996) design a unification-based partial parser FASTER which analyses raw technical text while meta-rules detect morpho-syntactic variants of controlled terms (blood cell, blood mononuclear cell). By using morphological and part-of-speech modules, the system are extended to the verbal phrases (tree cutting, tree have been cut down) (Klavans et al., 1997). Dealing with syntactic paraphrase in the general language, (Dras, 1997) propose a similar representation by using the STAG formalism to detect syntactic related sentences. Because we deal with the semantic level, our work is complementary of those.</Paragraph> <Paragraph position="1"> Semantic variation is rarely studied in specialized domains. Works on word similarity and word sense disambiguation are generally based on statistical methods designed for large or even very large corpora (Hindle, 1990; Agirre and Rigau, 1996). Therefore, they cannot be applied for technical documents which usually are medium size corpora. However, dealing with already linguistic filtered data, (Assadi, 1997) aims at statistically build rough clusters supposing that similar candidate terms have similar expansions. Then he relies on human expertise for the semantic interpretation. It differs from our work which tries to automatically explicit the semantic relations. In order to disambiguate noun objects in a short text (30 000 words), (Li et al., 1995) design heuristic rules using semantic similarity information in WordNet and verbs as context. Their system disambiguate an encouraging number on noun-verb pairs if one considers single and multiple sense assigned to a word.</Paragraph> <Paragraph position="2"> In (Basili et al., 1997), the lexical knowledge base WordNet (Miller et al., 1993) is used as a bootstrap for verb disambiguation. They tune it to the domain of the studied document by taking into account the contexts in which the verbs are used. This tuning leads both to eliminate certain semantic categories and to add new ones. For instance, the category contact is created for the verb to record. The resulted sense classification is thus a better description of the verb specialized meanings.</Paragraph> <Paragraph position="3"> Our symbolic and dictionary-based approach is close that of (Basili et al., 1997). They both use general language information (traditional dictionary vs. WordNet) for specialized corpora. However, their goals differ: disambiguation vs. semantic relation identification.</Paragraph> </Section> class="xml-element"></Paper>