File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-1723_concl.xml

Size: 1,128 bytes

Last Modified: 2025-10-06 13:53:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1723">
  <Title>A two-stage statistical word segmentation system for Chinese</Title>
  <Section position="6" start_page="21" end_page="21" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> This paper presents a two-stage statistical word segmentation system for Chinese. In the first stage, word bigram model and Viterbi algorithm are applied to perform known word segmentation on input plain text, and then a hybrid approach is employed in the second stage to incorporate word bigram probabilities, word juncture model and word-based word-formation patterns to detect OOV words. The experiments on Peking University corpora have shown that the present system based on fairly simple word bigram and word-formation models can achieve a F-score of 93.7% or above. In future work, we hope to improve our strategies on estimating word juncture model and word-formation patterns and develop an integrated segmentation technique that can perform known word segmentation and unknown word identification at one time.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML