File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/w00-1219_concl.xml
Size: 1,140 bytes
Last Modified: 2025-10-06 13:52:57
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-1219"> <Title>Extraction of Chinese Compound Words - An Experimental Study on a Very Large Corpus</Title> <Section position="6" start_page="138" end_page="138" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> In this paper, we investigate a statistical approach to Chinese compounds extraction from very large corpora using mutual information and context dependency.</Paragraph> <Paragraph position="1"> We explained how the performance can be influenced by different parameter settings, corpus size, and corpus heterogeneousness. We also refine the lexicon with information retrieval system by adding compounds obtained by our methods, and achieve 1.2% improvements on precision of IR.</Paragraph> <Paragraph position="2"> Through our experiments, we conclude that statistical method based on mutual information and context dependency is efficient and robust for Chinese compounds extraction. And, mutual information mainly affects the precision while context dependency mainly affects the count of extracted items.</Paragraph> </Section> class="xml-element"></Paper>