File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-3002_metho.xml

Size: 6,941 bytes

Last Modified: 2025-10-06 14:09:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-3002">
  <Title>Accessing GermaNet Data and Computing Semantic Relatedness</Title>
  <Section position="4" start_page="5" end_page="6" type="metho">
    <SectionTitle>
3 Semantic Relatedness Software
</SectionTitle>
    <Paragraph position="0"> In GermaNet, nouns, verbs and adjectives are structured within hierarchies of is-a relations.4 GermaNet also contains information on additional lexical and semantic relations, e.g. hypernymy, meronymy, antonymy, etc. (Kunze &amp; Lemnitzer, 2002). A semantic relatedness metric specifies to what degree the meanings of two words are related to each other. E.g. the meanings of Glas (Engl.</Paragraph>
    <Paragraph position="1"> glass) and Becher (Engl. cup) will be typically classified as being closely related to each other, while the relation between Glas and Juwel (Engl. gem) is more distant. RelatednessComparator is a class which takes two words as input and returns a numeric value indicating semantic relatedness for the two words. Semantic relatedness metrics have been implemented as descendants of this class.</Paragraph>
    <Paragraph position="2"> Three of the metrics for computing semantic relatedness are information content based (Resnik, 1995; Jiang &amp; Conrath, 1997; Lin, 1998) and are also implemented in WordNet::Similarity package. However, some aspects in the normalization of their results and the task definition according to which the evaluation is conducted have been changed (Gurevych &amp; Niederlich, 2005). The metrics are implemented as classes derived from Information-BasedComparator, which is in its turn derived from the class PathBasedComparator. They make use of both the GermaNet hierarchy and statistical corpus evidence, i.e. information content.</Paragraph>
    <Paragraph position="3"> 4As mentioned before, GermaNet abandoned the clusterapproach taken in WordNet to group adjectives. Instead a hierarchical structuring based on the work by Hundsnurscher &amp; Splett (1982) applies, as is the case with nouns and verbs.  We implemented a set of utilities for computing information content of German word senses from German corpora according to the method by Resnik (1995). The TreeTagger (Schmid, 1997) is employed to compile a part-of-speech tagged word frequency list. The information content values of GermaNet synsets are saved in a text file called an information content map. We experimented with different configurations of the system, one of which involved stemming of corpora and the other did not involve any morphological processing. Contrary to our intuition, there was almost no difference in the information content maps arising from the both system configurations, with and without morphological processing. Therefore, the use of stemming in computing information content of German synsets seems to be unjustified.</Paragraph>
    <Paragraph position="4"> The remaining two metrics of semantic relatedness are based on the Lesk algorithm (Lesk, 1986). The Lesk algorithm computes the number of overlaps in the definitions of words, which are sometimes extended with the definitions of words related to the given word senses (Patwardhan et al., 2003).</Paragraph>
    <Paragraph position="5"> This algorithm for computing semantic relatedness is very attractive. It is conceptually simple and does not require an additional effort of corpus analysis compared with information content based metrics.</Paragraph>
    <Paragraph position="6"> However, a straightforward adaptation of the Lesk metric to GermaNet turned out to be impossible.</Paragraph>
    <Paragraph position="7"> Textual definitions of word senses in GermaNet are fairly short and small in number. In cotrast to Word-Net, GermaNet cannot be employed as a machine-readable dictionary, but is primarily a conceptual network. In order to deal with this, we developed a novel methodology which generates definitions of word senses automatically from GermaNet using the GermaNet API. Examples of such automatically generated definitions can be found in Gurevych &amp; Niederlich (2005). The method is implemented in the class PseudoGlossGenerator of our software, which automatically generates glosses on the basis of the conceptual hierarchy.</Paragraph>
    <Paragraph position="8"> Two metrics of semantic relatedness are, then, based on the application of the Lesk algorithm to  have to be included in the final definition. Experiments carried out to determine the most effective parameters for generating the definitions and employing those to compute semantic relatedness is described in Gurevych (2005). Gurevych &amp; Niederlich (2005) present a description of the evaluation procedure for five implemented semantic relatedness metrics against a human Gold Standard and the evaluation results.</Paragraph>
  </Section>
  <Section position="5" start_page="6" end_page="7" type="metho">
    <SectionTitle>
4 Graphical User Interface
</SectionTitle>
    <Paragraph position="0"> We developed a graphical user interface to interactively experiment with the software for computing semantic relatedness. The system runs on a standard Linux or Windows machine. Upon initialization, we configured the system to load an information content map computed from the German taz corpus.5 The information content values encoded therein are employed by the information content based metrics.</Paragraph>
    <Paragraph position="1"> For the Lesk based metrics, two best configurations for generating definitions of word senses are offered via the GUI: one including three hypernyms of a word sense, and the other one including all related synsets (two iterations) except hyponyms. The representation of synsets in a generated definition is constituted by one (the first) of their word senses.</Paragraph>
    <Paragraph position="2"> The user of the GUI can enter two words together with their part-of-speech and specify one of the five metrics. Then, the system displays the corresponding word stems, possible word senses ac- null cording to GermaNet, definitions generated for these word senses and their information content values.</Paragraph>
    <Paragraph position="3"> Furthermore, possible combinations of word senses for the two words are created and returned together with various diagnostic information specific to each of the metrics. This may be e.g. word overlaps in definitions for the Lesk based metrics, or lowest common subsumers and their respective information content values, depending on what is appropriate.</Paragraph>
    <Paragraph position="4"> Finally, the best word sense combination for the two words is determined and this is compactly displayed together with a semantic relatedness score. The interface allows the user to add notes to the results by directly editing the data shown in the GUI and save the detailed analysis in a text file for off-line inspection. The process of user-system interaction is summarized in Figure 1.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML