File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0605_intro.xml
Size: 1,265 bytes
Last Modified: 2025-10-06 14:06:24
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0605"> <Title>AUTOMATIC LEXICON ENHANCEMENT BY MEANS OF CORPUS TAGGING</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Using specialised text corpus to automatically enhance a general lexicon is the aim of this study. Indeed, having lexicons which offer maximal cover on a specific topic is an important benefit in many applications of Automatic Speech and Natural Language Processing. The enhancement of these lexicons can be made automatic as big corpora of specialised texts are available.</Paragraph> <Paragraph position="1"> A syntactic tagging process, based on 3-class and 3-gram language models, allows us to automatically allocate possible syntactic categories to the Out-Of-Vocabulary (OOV) words which are found in the corpus processed. These OOV words generally occur several times in the corpus, and a number of these occurrences can be important. By taking into account all the occurrences of an OOV word in a given text as a whole, we propose here a method for automatically extracting a specialised lexicon from a text corpus which is representative of a specific topic.</Paragraph> </Section> class="xml-element"></Paper>