File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-3027_intro.xml
Size: 1,463 bytes
Last Modified: 2025-10-06 14:02:58
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-3027"> <Title>A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005</Title> <Section position="3" start_page="0" end_page="168" type="intro"> <SectionTitle> 2 Algorithm </SectionTitle> <Paragraph position="0"> Our system builds on research into conditional random field (CRF), a statistical sequence modeling framework first introduced by Lafferty et al. (2001). Work by Peng et al. (2004) first used this framework for Chinese word segmentation by treating it as a binary decision task, such that each character is labeled either as the beginning of a word or the continuation of one.</Paragraph> <Paragraph position="1"> Gaussian priors were used to prevent overfitting and a quasi-Newton method was used for parameter optimization.</Paragraph> <Paragraph position="2"> The probability assigned to a label sequence for a particular sequence of characters by a CRF is given by the equation below:</Paragraph> <Paragraph position="4"> Y is the label sequence for the sentence, X is the sequence of unsegmented characters, Z(X) is a normalization term, fk is a feature function, and c indexes into characters in the sequence being labeled.</Paragraph> <Paragraph position="5"> A CRF allows us to utilize a large number of n-gram features and different state sequence based features and also provides an intuitive framework for the use of morphological features.</Paragraph> </Section> class="xml-element"></Paper>