File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0802_intro.xml
Size: 2,299 bytes
Last Modified: 2025-10-06 14:06:29
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0802"> <Title>GermaNet a Lexical-Semantic Net for German</Title> <Section position="3" start_page="0" end_page="9" type="intro"> <SectionTitle> 2 Resources and Modeling </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="9" type="sub_section"> <SectionTitle> Methods </SectionTitle> <Paragraph position="0"> In English a variety of large-scale online linguistic resources are available. The application of these resources is essential for various NLP tasks in reducing time effort and error rate, as well as guaranteeing a broader and more domain-independent coverage. The resources are typically put to use for the creation of consistent and large lexical databases for parsing and machine translation as well as for the treatment of lexical, syntactic and semantic ambiguity. Furthermore, linguistic resources are becoming increasingly important as training and evaluation material for statistical methods.</Paragraph> <Paragraph position="1"> In German, however, not many large-scale monolingual resources are publically available which can aid the building of a semantic net. The particular resource situation for German makes it necessary to rely to a large extent on manual labour for the creation process of a wordnet, based on monolingual general and specialist dictionaries and literature, as well as comparisons with the English WordNet. However, we take a strongly corpus-based approach by determining the base vocabulary modeled in GermaNet by lemmatized frequency lists from text corpora x. This list is further tuned by using other available sources such as the CELEX German database. Clustering methods, which in principle can apply to large corpora without requiring any further information in order to give similar words as output, proved to be interesting but not helpful for the construction of the core net. Selectional restrictions of verbs for nouns will, however, be automatically extracted by clustering methods. We use the Princeton Word-Net technology for the database format, database compilation, as well as the Princeton WordNet interface, applying extensions only where necessary. This results in maximal compatibility.</Paragraph> </Section> </Section> class="xml-element"></Paper>