File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/05/i05-3020_abstr.xml
Size: 1,031 bytes
Last Modified: 2025-10-06 13:44:20
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-3020"> <Title>Report to BMM-based Chinese Word Segmentor with Context-based Unknown Word Identifier for the Second International Chinese Word Segmentation Bakeoff</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> This paper describes a Chinese word segmentor (CWS) based on backward maximum matching (BMM) technique for the 2nd Chinese Word Segmentation Bakeoff in the Microsoft Research (MSR) closed testing track. Our CWS comprises of a context-based Chinese unknown word identifier (UWI). All the context-based knowledge for the UWI is fully automatically generated by the MSR training corpus. According to the scored results of the MSR closed testing track and our analysis, it shows that our BMM-based CWS with the context-based UWI is a simple and effective system to achieve high Chinese word segmentation performance of more than 95.5% F-measure.</Paragraph> </Section> class="xml-element"></Paper>