File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/97/w97-0605_evalu.xml
Size: 1,536 bytes
Last Modified: 2025-10-06 14:00:30
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0605"> <Title>AUTOMATIC LEXICON ENHANCEMENT BY MEANS OF CORPUS TAGGING</Title> <Section position="8" start_page="31" end_page="31" type="evalu"> <SectionTitle> 5.3 Results </SectionTitle> <Paragraph position="0"> We verified manually the first 1000 most frequent OOV words of each filtered lexicon. The results are presented as follows : Table 1 shows, in the column &quot;Correct&quot;, the percentage of OOV words where all the labels were correct ; the column &quot;Wrong&quot; indicates the percentage of words which were labelled with at least one incorrect tag.</Paragraph> <Paragraph position="1"> Table 2 details, for the common-words lexicon, the results obtained on the correct words. The column &quot;All classes&quot; shows the percentage of correct words which had all their possible syntactic categories in the lexicon. The column &quot;Missing classes&quot; indicates the percentage of correct words which could have received more syntactic categories than those stored in the lexicon.</Paragraph> <Paragraph position="2"> Table 2 \] All classes I Missing classes Common-words I 79% \] 21% These results show that the criteria used to filter the OOV lexicons allows us to produce reliable lexicons (only 4% of the OOV common-words contained label errors). By keeping the 1000 most frequent words of each lexicon, we reduced by 20% the lack of coverage of our general lexicon on all the test corpus.</Paragraph> </Section> class="xml-element"></Paper>