File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/05/i05-3020_abstr.xml

Size: 1,031 bytes

Last Modified: 2025-10-06 13:44:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-3020">
  <Title>Report to BMM-based Chinese Word Segmentor with Context-based Unknown Word Identifier for the Second International Chinese Word Segmentation Bakeoff</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper describes a Chinese word segmentor (CWS) based on backward maximum matching (BMM) technique for the 2nd Chinese Word Segmentation Bakeoff in the Microsoft Research (MSR) closed testing track. Our CWS comprises of a context-based Chinese unknown word identifier (UWI). All the context-based knowledge for the UWI is fully automatically generated by the MSR training corpus. According to the scored results of the MSR closed testing track and our analysis, it shows that our BMM-based CWS with the context-based UWI is a simple and effective system to achieve high Chinese word segmentation performance of more than 95.5% F-measure.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML