File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2205_intro.xml
Size: 9,695 bytes
Last Modified: 2025-10-06 14:04:03
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2205"> <Title>Recognition of synonyms by a lexical graph</Title> <Section position="4" start_page="0" end_page="33" type="intro"> <SectionTitle> 2 Related work </SectionTitle> <Paragraph position="0"> Identification of semantic relations has been approached by different communities as a component of a knowledge management system or application of a developed NLP framework.</Paragraph> <Paragraph position="1"> Many approaches are guided by the assumption that similar terms occur in similar context and obtain a context representation of terms as attribute vectors or relation tuples (Curran and Moens, 2002), (Ruge, 1997), (Lin, 1998). A similarity metric defined on the context representations is used to cluster similar terms (e.g. by the nearest neighbor method).</Paragraph> <Paragraph position="2"> The actual definitions of context (whole document (Chen and Lynch, 1992), textual window, some customized syntactic contexts, cf.</Paragraph> <Paragraph position="3"> (Senellart and Blondel, 2003)) and similarity metric (cf. (Manning and Sch&quot;utze, 1999), (Curran and Moens, 2002)) are the essential distinguishing features of the approaches.</Paragraph> <Paragraph position="4"> A pattern-based method is proposed by Hearst (Hearst, 1998). Existing relations in the WordNet database are used to discover regular linguistic patterns that are characteristic for these relations. The patterns contain lexical and syntactic elements and are acquired from a text corpus by identifying common context of word pairs for which a semantic relation holds. Identified patterns are applied to a large text corpus to detect new relations. The method can be enhanced by applying filtering steps and iterating over new found instances (Phillips and Riloff, 2002).</Paragraph> <Paragraph position="5"> Lafourcade and Prince base their approach on reduction of word semantics to conceptual vectors (vector space is spanned by a hierarchy of concepts provided by a thesaurus, (Lafourcade, 2001)). Every term is projected in the vector space and can be expressed by the linear combination of conceptual vectors. The angle between the vectorial representations of two terms is used in calculation of thematic closeness (Lafourcade and Prince, 2001). The approach is more closely related to our approach since it offers a quantitative metric to measure the degree of synonymy between two lexical items.</Paragraph> <Paragraph position="6"> In contrast, Turney (Turney, 2001) tries to solve a quite simpler &quot;TOEFL-like&quot; task of selecting a synonym to a given word from a set of words. Mutual information related to the co-occurrence of two words combined with information retrieval is used to assess the degree of their statistical independency. The least independent word is regarded synonymous.</Paragraph> <Paragraph position="7"> Blondell et al. (Blondel et al., 2004) encode a monolingual dictionary as a graph and identify synonyms by finding subgraphs that are similar to the subgraph corresponding to the queried term.</Paragraph> <Paragraph position="8"> The common evaluation method for similarity metrics is comparing their performance on the same test set with the same context representations with some manually created semantic source as the gold standard (Curran and Moens, 2002). Abstracting from results for concrete test sets, Weeds et al. (2004) try to identify statistical and linguistic properties on that the performance of similarity metrics generally depends. Different bias towards words with high or low frequency is recognized as one reason for the significant variance of k-nearest neighbors sets of different similarity metrics.</Paragraph> <Paragraph position="9"> 3 Construction of the lexical graph The assumption that similar terms occur in similar context leads to the establishing of explicit context models (e.g. in form of vectors or relation tuples) by most researchers. We build an implicit context representation connecting lexicalitemsinawaycorrespondingtothesentence structure (as opposed to (Blondel et al., 2004)), where a term is linked to every word in its definition). The advantage of the graph model is its transitivity: not only terms in the immediate context but also semantically related terms that have a short path to the examined term (but perhaps have never occurred in its immediate context) can contribute to identification of related terms. The similarity metric can be intuitively derived from the distance between the lexical vertices in the graph. To construct the lexical graph articles from five volumes of two German computer journals have been chunk-parsed and POS tagged using TreeTagger (2004). To preserve the semantic structure of the sentences during the graph construction, i.e. to connect words that build the actual statement of the sentence, parsed sentences are preprocessed before being inserted in the graph (fig. 1). The punctuation signs and parts of speech that do not carry a self-contained semantics (such as conjunctions, pronouns, articles) are removed in a POS filtering step. Tokenization errors are heuristically removed and the words are replaced by their normal forms (e.g. infinitive form for verbs, nominative singular for nouns).</Paragraph> <Paragraph position="10"> German grammar is characterized by a very frequent use of auxiliary and modal verbs that in most cases immediately precede or follow the semantically related sentence parts such as direct object or prepositional phrase while the main verb is often not adjacent to the related parts in a sentence. Since the direct edge between the main verb and non-adjacent related sentence parts cannot be drawn, the sentence is syntactically reorganized by replacing the modal or auxiliary verbs by the corresponding main verb. Another syntactic rearrangement takes place when detachable prefixes are attached to the corresponding main verb. In German some prefixes of verbs are detached and located at the end of the main clause. Since verbs without a prefix have a different meaning prefixes have to be attached to the verb stem. The reorganized sentence formed in a lexical graph can be added to the graph inserting the normalized words in a sentence as vertices and connecting the adjacent words by a directed edge. However, some adjacent words are not semantically related to each other, therefore the lexical graph features two types of edges (see an example in fig. 2). A property edge links the head word of a syntactic chunk (verb or noun phrase) with its modifiers (adverbs or adjectives respectively) that characterize the head word and is bidirectional. A sequential edge connects the head words (e.g. main verbs, head nouns) of syntactic chunks reflecting the &quot;semantic backbone&quot; of the sentence.</Paragraph> <Paragraph position="11"> The length of an edge represents how strong two lexical items are related to each other and depends therefore on the frequency of their co-occurrence. It is initialized with a maximum length M. Every time an existing edge is found in the currently processed sentence, its current length CurLen is modified according to CurLen = MM CurLen+1 ; hence the length of an edge is inversely proportional to the frequency of co-occurrence of its endpoints.</Paragraph> <Paragraph position="12"> After all sentences from the text corpus have been added to the lexical graph, vertices (words) with a low frequency ([?] th) are removed from the graph to primarily accelerate the distance calculation. Such rarely occurring words are usually proper nouns, abbreviations, typos etc. Because of the low frequency semantic relations for these words cannot be confidently identified. Therefore removing such vertices reduces the size of the graph significantly without performance penalty (the graph generated from 5 journal volumes contained ca. 300000 vertices and 52191 after frequency filtering with th = 8).</Paragraph> <Paragraph position="13"> Experimental results feature even a slightly better performance on filtered graphs. To preserve semantic consistency of the graph and compensate removal of existing paths the connections between the predecessors and successors of removed vertices have to be taken into account: the edge length e(p,s) between the predecessor p to the successor s of the removed vertex r can incorporate the length of the path length(p,r,s) from p to s through r by calculating the halved harmonic mean: e(p,s) = e(p,s)[?]lprse(p,s)+lprs. e(p,s) is the more reduced the smaller length(p,r,s) is and if they are equal, e(p,s) is half as long after merging. Beside direct edges an important indication of semantic closeness is the distance, i.e. the length of the shortest path between two vertices. Distances are calculated by the Dijkstra algorithm with an upper threshold Th.</Paragraph> <Paragraph position="14"> Once the distances from a certain vertex reach the threshold, the calculation for this vertex is aborted and the not calculated distances are considered infinite. Using the threshold reduces the runtime and space considerably while the semantic relation between the vertices with distances > Th is negligible.</Paragraph> <Paragraph position="15"> The values of M, th and Th depend on the particular text corpus and are chosen to keep the size of the graph feasible. th can be determined experimentally incrementing it as long as the results on the test set are improving. The resulting graph generated from five computer journals volumes with M = 220, th = 8, Th = 60000 contained 52191 vertices, 4,927,365 edges and 376,000,000 distances.</Paragraph> </Section> class="xml-element"></Paper>