File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/c00-2116_concl.xml
Size: 1,197 bytes
Last Modified: 2025-10-06 13:52:46
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2116"> <Title>Automatic Corpus-Based Thai Word Extraction with the C4.5 Learning Algorithm</Title> <Section position="6" start_page="805" end_page="805" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> In this paper, we have applied the c4.5 learning algorithm for the task of Thai word extraction.</Paragraph> <Paragraph position="1"> C4.5 can construct a good decision tree for word/non-word disambiguation. The learned attributes, which are mutual information, entropy, word frequency, word length, functional words, first two and last two characters, can capture useful information for word extraction. Our approach yields about 85% and 56% in precision and recall measures respectively, which is comparable to employing an existing dictionary.</Paragraph> <Paragraph position="2"> The accuracy should be higher in larger corpora.</Paragraph> <Paragraph position="3"> Our future work is to apply this algorithm with larger corpora to build a corpus-based Thai dictionary. And hopefully, out&quot; approach should be successful for other non-word-boundary languages.</Paragraph> </Section> class="xml-element"></Paper>