File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/p06-1085_evalu.xml

Size: 1,621 bytes

Last Modified: 2025-10-06 13:59:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1085">
  <Title>Contextual Dependencies in Unsupervised Word Segmentation[?]</Title>
  <Section position="7" start_page="678" end_page="679" type="evalu">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> In this paper, we have introduced a new model-based approach to word segmentation that draws on techniques from Bayesian statistics, and we have developed models incorporating unigram and bigram dependencies. The use of the Dirichlet process as the basis of our approach yields sparse solutions and allows us the flexibility to modify individual components of the models. We have presented a method of inference using Gibbs sampling, which is guaranteed to converge to the posterior distribution over possible segmentations of a corpus.</Paragraph>
    <Paragraph position="1"> Our approach to word segmentation allows us to investigate questions that could not be addressed satisfactorily in earlier work. We have shown that the search algorithms used with previous models of word segmentation do not achieve their ob-</Paragraph>
    <Paragraph position="3"> jectives, which has led to misleading results. In particular, previous work suggested that the use of word-to-word dependencies has little effect on word segmentation. Our experiments indicate instead that bigram dependencies can be crucial for avoiding under-segmentation of frequent collocations. Incorporating these dependencies into our model greatly improved segmentation accuracy, and led to better performance than previous approaches on all measures.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML