File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/a00-2032_concl.xml

Size: 1,146 bytes

Last Modified: 2025-10-06 13:52:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-2032">
  <Title>Mostly-Unsupervised Statistical Segmentation of Japanese: Applications to Kanji</Title>
  <Section position="7" start_page="246" end_page="246" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> In this paper, we have presented a simple, mostly-unsupervised algorithm that segments Japanese sequences into words based on statistics drawn from a large unsegmented corpus. We evaluated performance on kanji with respect to several metrics, including the novel compatible brackets and all-compatible brackets rates, and found that our algorithm could yield performances rivaling that of lexicon-based morphological analyzers.</Paragraph>
    <Paragraph position="1"> In future work, we plan to experiment on Japanese sentences with mixtures of character types, possibly in combination with morphological analyzers in order to balance the strengths and weaknesses of the two types of methods. Since our method does not use any Japanese-dependent heuristics, we also hope to test it on Chinese or other languages as well.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML