File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/w00-1202_concl.xml
Size: 1,343 bytes
Last Modified: 2025-10-06 13:52:57
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1202"> <Title>Sense-Tagging Chinese Corpus</Title> <Section position="6" start_page="137" end_page="137" type="concl"> <SectionTitle> 5. Conclusion </SectionTitle> <Paragraph position="0"> This paper analyzes the polysemy degree in Mandarin Chinese. We consider the distribution of word senses from POS and frequency. Under the Cilin small categories, 23.67% of word types in ASBC corpus are middle or high frequent words, but they occupy 94.06% of word tokens. We adopt contextual information and mapping from WordNet synsets to Cilin sense tags to deal with this challengeable problem. The performances for tagging low, middle and high ambiguous words are 63.98%0, 60.92%, and 67.95% when small proposed. Comparatively, the performances categories are used and 1-3 candidates are 71.02%, 73.88%, and 75.94% by using middle categories. The performance of tagging unknown words is 34.35%. It is worse than that of tagging ambiguous words, but is much better than that of the baseline mode. The overall performance is the sense tagger is 76.04%. Although sense tagging does not achieve the performance of POS tagging, the sense tagger proposed in this paper is still a useful computer-aided tool to reduce the human cost on tagging a large-scale corpus.</Paragraph> </Section> class="xml-element"></Paper>