File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-1721_concl.xml
Size: 1,185 bytes
Last Modified: 2025-10-06 13:53:47
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1721"> <Title>Chinese Word Segmentation Using Minimal Linguistic Knowledge</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> We have presented our word segmentation system and the results for the closed track using the a0a2a1 corpus and the a3a5a4 corpus. The new words recognition, combining single characters, and checking consistencies contributed the most to the increase in precision and recall over the performance of the base segmentation algorithm, which works better than maximum matching. For the closed track experiment using the a3a5a4 corpus, we found that 62% of the text fragments that are incorrectly segmented by our system are actually new words, which clearly shows that to further improve the performance of our system, a better new words recognition algorithm is necessary. Our failure analysis also indicates that up to 21.7% of the mistakes made by our system for the PK closed track may have been impacted by the segmentation inconsistencies between the training and testing data.</Paragraph> </Section> class="xml-element"></Paper>