File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/i05-3020_concl.xml
Size: 1,674 bytes
Last Modified: 2025-10-06 13:54:38
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-3020"> <Title>Report to BMM-based Chinese Word Segmentor with Context-based Unknown Word Identifier for the Second International Chinese Word Segmentation Bakeoff</Title> <Section position="5" start_page="144" end_page="144" type="concl"> <SectionTitle> 4 Conclusions and Future Directions </SectionTitle> <Paragraph position="0"> In this paper, we have applied a BMM-based CWS comprised of a context-based UWI to the Chinese word segmentation and obtained a high performance of 95.5% F-measure in the MSR closed track. To sum up the results of this study, we have following conclusions and future directions: null (1)Since the F-measure of Step 1 of our CWS is 94.3%, it indicates that the BMM with BMM-ASM knowledge is a simple but probably effective technique as a good base in developing a high performance CWS; (2)Since 82% of segmentation errors of our CWS caused by LUW problem, this result supports that a high performance CWS is relied on a high performance Chinese UWI.</Paragraph> <Paragraph position="1"> (3)For a CWS, there are two critical and probably independent tasks: the optimization of LUW-EIW tradeoff and the detection and disambiguation of OAS and CAS error segmentation. We believe the former task is more critical than the later one.</Paragraph> <Paragraph position="2"> (4)We will continue to expand our CWS with other linguistic knowledge (such as part-of-speech information and morphology) and BTM model (Tsai 2005) to improve our BMM-based CWS for attending the third International Chinese Word Segmentation Bakeoff in both closed and open testing tracks.</Paragraph> </Section> class="xml-element"></Paper>