File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1815_intro.xml

Size: 1,869 bytes

Last Modified: 2025-10-06 14:01:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1815">
  <Title>CombiningClassifiersforChineseWordSegmentation</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> It is generally agreed among researchers that word segmentation is a necessary first step in Chinese language processing. Most of the previous work in this area views a good dictionaryasthecornerstoneofthistask.Several word segmentation algorithms have been developedusingadictionaryasanessentialtool.</Paragraph>
    <Paragraph position="1"> Most notably, variants of the maximum matchingalgorithmhavebeenappliedtoword segmentation with considerable success. The resultsthathavebeenreportedaregenerallyin the upper 90 percentile range. However, the successofsuchalgorithmsispremisedonalarge, exhaustive dictionary. The accuracy of word segmentation degrades sharply as new words appear. Since Chinese word formation is a highlyproductiveprocess,newwordsarebound to appear in substantial numbers in realistic scenarios(WuandJiang1998,Xue2001),andit isvirtuallyimpossibletolistallthewordsina dictionary.Inrecentyears,asannotatedChinese corpora have become available, various machine-learningapproacheshavebeenapplied to Chinese word segmentation, with different levels of success. Compared with dictionarybasedapproaches,machine-learningapproaches null havetheadvantageofnotneedingadictionary andthusaremoresuitableforuseonnaturally occurringChinesetext.Inthispaperwereport results of a supervised machine-learning approach towards Chinese word segmentation that combines two fairly standard machine learningmodels.Weshowthatthisapproachis verypromisingcomparedwithdictionary-based approaches as well as other machine-learning approaches that have been reported in the literature.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML