File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/05/i05-3027_abstr.xml
Size: 1,053 bytes
Last Modified: 2025-10-06 13:44:19
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-3027"> <Title>A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We present a Chinese word segmentation system submitted to the closed track of Sighan bakeoff 2005.</Paragraph> <Paragraph position="1"> Our segmenter was built using a conditional random field sequence model that provides a framework to use a large number of linguistic features such as character identity, morphological and character reduplication features.</Paragraph> <Paragraph position="2"> Because our morphological features were extracted from the training corpora automatically, our system was not biased toward any particular variety of Mandarin. Thus, our system does not overfit the variety of Mandarin most familiar to the system's designers. Our final system achieved a F-score of 0.947 (AS), 0.943 (HK), 0.950 (PK) and 0.964 (MSR).</Paragraph> </Section> class="xml-element"></Paper>