File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/i05-3033_concl.xml
Size: 1,314 bytes
Last Modified: 2025-10-06 13:54:38
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-3033"> <Title>Towards a Hybrid Model for Chinese Word Segmentation</Title> <Section position="5" start_page="190" end_page="191" type="concl"> <SectionTitle> 4 Conclusions </SectionTitle> <Paragraph position="0"> We described a hybrid Chinese word segmenter that combines the transformation-based learning algorithm for character-based tagging and linguistic heuristics for transforming tagged character sequences into word-segmented sentences.</Paragraph> <Paragraph position="1"> As the segmenter is in its first stage of development and is far from mature, the bakeoff provided an especially valuable opportunity for evaluating its performance. The results suggest that: 1. Despite the lack of a separate mechanism for unknown word recognition, the segmenter performed relatively well on OOV words. This confirms our hypothesis that character-based tagging has a good potential for improving Chinese unknown word identification.</Paragraph> <Paragraph position="2"> 2. Using linguistic heuristics at the merging stage can help improve segmentation results.</Paragraph> <Paragraph position="3"> 3. There is much room for improvement for both the tagging algorithm and the merging algorithm. This is being undertaken. null</Paragraph> </Section> class="xml-element"></Paper>