File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-0136_concl.xml

Size: 1,272 bytes

Last Modified: 2025-10-06 13:55:31

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0136">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics N-gram Based Two-Step Algorithm for Word Segmentation</Title>
  <Section position="7" start_page="199" end_page="199" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> We described a two-step word segmentation algorithm as a result of the closed track in bakeoff 2006. The algorithm is based on the cross validation of the word spacing probability by using n-gram features of &lt;character, space-tag&gt;.</Paragraph>
    <Paragraph position="1"> One of the advantages of our system is that it can show the self-confidence score for ambiguous or feature-conflict cases. We have not applied any language dependent resources or functionalities such as lexicons, numeric expressions, and proper name recognition. We expect that our approach will be helpful for the detection of error-prone tags and the construction of error correction dictionaries when we develop a practical system. Furthermore, the proposed algorithm has been applied to the Korean language and we achieved a good improvement on proper names, though overall performance is similar to the previous method.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML