File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2121_intro.xml
Size: 2,165 bytes
Last Modified: 2025-10-06 14:00:55
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2121"> <Title>McMahon J.G., Smith F.J.: Improving Statistical Language Model Pelformance with Automatically Generated Word Ilierarchies.</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. State-of-the-art Language Modeling techniques </SectionTitle> <Paragraph position="0"> (McMahon and Smith., 1996) require lexical intbrmation about word classes.</Paragraph> <Paragraph position="1"> 2. Thesauri creation in a (semi-) automatic manner in any domain and language with minimal dependence on specialized tools and resources is very important. Most thematic domains today in most of the languages lack semantic resources. Adopting a knowledge-poor corpus-based method not only much less labor is necessary in construction of conceptual structures but also domain-dependent semantic relations are obtained. New resources can be readily created in new domains or existing thesauri can be enlarged or refined by re-training on larger corpora as soon as they become available.</Paragraph> <Paragraph position="2"> 3. Many currently implemented, both spoken and written, NLP systems operate in a specific domain and usually utilize a constrained vocabulary related directly to their task domain. Therefore semantic domain-dependent knowledge can be acquired directly from relevant corpora.</Paragraph> <Paragraph position="3"> 4. Autonomous computational intelligence should rely mainly on processing of tree flow electronic texts for acquiring new semantic and world knowledge.</Paragraph> <Paragraph position="4"> The present approach aims at corpus-based automatic extraction of domain-dependent semantic similarity relations between lexical items and the formation of corresponding semantic clusters. For this purpose, the usage of readily available domain-specific text corpora is imperative. The guideline of our approach was the adaptation to the special characteristics of this type of corpora (specialization, restricted size) without imposing the need for other domain-dependent resources and obtaining portability across languages.</Paragraph> </Section> class="xml-element"></Paper>