File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-1117_concl.xml

Size: 1,687 bytes

Last Modified: 2025-10-06 13:53:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1117">
  <Title>A Character-net Based Chinese Text Segmentation Method</Title>
  <Section position="8" start_page="0" end_page="0" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> In this paper, an algorithm of finding all possible candidate words in segmentation of a Chinese text has been presented. The algorithm is based on a Chinese-character-net, which is established by the information of connections between each two Chinese characters. The algorithm has some characteristics as follows: (1) the character net is a basic data structure, makes the use of all information in segmentation  consistently and easy.</Paragraph>
    <Paragraph position="1"> (2) The scan of a text need only once.</Paragraph>
    <Paragraph position="2"> (3) The algorithm is easy combined with other existing algorithms.</Paragraph>
    <Paragraph position="3"> (4) The algorithm is effective.</Paragraph>
    <Paragraph position="4"> (5) The algorithm is easy extensible.</Paragraph>
    <Paragraph position="5">  After all possible candidate segmentation words are obtained, we can obtain the result of FMM by FMM thought, obtain the result of BMM by BMM thought, and can process ambiguity and unknown words by probability grammar or HMM method.</Paragraph>
    <Paragraph position="6"> Based on the result obtained by the algorithm, different tactics of processing the possible candidates words segmented can be adopted according to the needs of different kinds of applications such as search engine [Zhou 2001], text classification, machine translation, information extraction, retrieval or filter etc.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML