File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1175_concl.xml
Size: 1,131 bytes
Last Modified: 2025-10-06 13:54:00
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1175"> <Title>Combining Prediction by Partial Matching and Logistic Regression for Thai Word Segmentation</Title> <Section position="7" start_page="43" end_page="43" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> This paper proposes a two-step approach to Thai word segmentation. Studying the characteristics of Thai language, we find that word segmentation possesses ambiguities at both character and syllable levels. The proposed technique consists of two steps. The first step is designed to reduce the character-level ambiguity by focusing on extracting syllables whose structures are more well-defined. Then the second step combines syllables into words by using binary logistic regression model. Experimental evaluations emphasize the importance of pre-identifying syllables correctly, show the accuracy of applying PPM to syllable segmentation of 98%, and indicate the effectiveness of the proposed approach to combine syllables into words. The overall accuracy of Thai word segmentation is 97.17%.</Paragraph> </Section> class="xml-element"></Paper>