File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-3027_intro.xml

Size: 1,463 bytes

Last Modified: 2025-10-06 14:02:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-3027">
  <Title>A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005</Title>
  <Section position="3" start_page="0" end_page="168" type="intro">
    <SectionTitle>
2 Algorithm
</SectionTitle>
    <Paragraph position="0"> Our system builds on research into conditional random field (CRF), a statistical sequence modeling framework first introduced by Lafferty et al. (2001). Work by Peng et al. (2004) first used this framework for Chinese word segmentation by treating it as a binary decision task, such that each character is labeled either as the beginning of a word or the continuation of one.</Paragraph>
    <Paragraph position="1"> Gaussian priors were used to prevent overfitting and a quasi-Newton method was used for parameter optimization.</Paragraph>
    <Paragraph position="2"> The probability assigned to a label sequence for a particular sequence of characters by a CRF is given by the equation below:</Paragraph>
    <Paragraph position="4"> Y is the label sequence for the sentence, X is the sequence of unsegmented characters, Z(X) is a normalization term, fk is a feature function, and c indexes into characters in the sequence being labeled.</Paragraph>
    <Paragraph position="5"> A CRF allows us to utilize a large number of n-gram features and different state sequence  based features and also provides an intuitive framework for the use of morphological features.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML