File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-0115_concl.xml
Size: 2,385 bytes
Last Modified: 2025-10-06 13:55:29
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-0115"> <Title>The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition</Title> <Section position="7" start_page="115" end_page="115" type="concl"> <SectionTitle> 5 Conclusions & Future Directions </SectionTitle> <Paragraph position="0"> The Third SIGHAN Chinese Language Processing Bakeoff successfully brought together a collection of 29 strong research groups to assess the progress of research in two important tasks, word segmentation and named entity recognition, that in turn enable other important language processing technologies. The individual group presentations at the SIGHAN workshop detail the approaches that yielded strong performance for both tasks. Issues of out-of-vocabulary word handling, annotation consistency, character encoding and code mixing of Chinese and non-Chinese text all continue to challenge system designers and bakeoff organizers alike.</Paragraph> <Paragraph position="1"> In future analyses, we hope to develop additional analysis tools to better assess progress in these fundamental tasks, in a more corpus independent fashion. Microsoft Research Asia has been pursuing work along these lines focusing on improvements in F-score and OOV F-score relative to more intrinsic corpus measures, such as baselines and toplines.5 Such developments will guide the planning of future evaluations.</Paragraph> <Paragraph position="2"> Finally, while word segmentation and named entity recognition are important in themselves, it is also important to assess the impact of improvements in these enabling technologies on broader downstream applications. More tightly coupled experiments that involve joint word segmentation and named entity recognition could provide insight. Integration of WS and NER with a higher level task such as parsing, reference resolution, or machine translation could allow the development of more refined, task-oriented metrics to evalu-GPE tags in the truth data mapped to LOC, since no GPE tags were present in the results.</Paragraph> <Paragraph position="3"> 5Personal communication, Mu Li, Microsoft Research Asia.</Paragraph> <Paragraph position="4"> ate WS and NER and focus attention on improvements to the fundamental techniques which enhance performance on higher level tasks.</Paragraph> </Section> class="xml-element"></Paper>