File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/97/w97-0127_abstr.xml
Size: 1,339 bytes
Last Modified: 2025-10-06 13:49:03
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-0127"> <Title>I : I I I I I</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Corpus-based statistical-oriented Chinese word classification can be regarded as a fundamental step for automatic or non-automatic, mono-lingual natural processing system. Word classification can solve the problems of data sparseness and have far fewer parameters. So far, much r:,la~v~ work about word classification has been done. All the work is based on some similarity metrics. We use average mutual information as global similarity metric to do classification. The clustering process is top-down splitting and the binary tree is growin~ with splitting. In natural lan~lage, the effect of left neighbors and right neighbors of a word are asymmetric. To utilize this directional information, we induce the left-right binary and right-left binary tree to represent this property. The probability is also introduced in our algorithm to merge the resulting classes from left-right and right-left binary tree. Also, we use the resulting classes to do experiments on word class-based language model. Some classes results and perplexity of word class-based language model are presented.</Paragraph> </Section> class="xml-element"></Paper>