File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/h93-1045_concl.xml
Size: 1,697 bytes
Last Modified: 2025-10-06 13:57:03
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1045"> <Title>EXAMPLE-BASED CORRECTION OF WORD SEGMENTATION AND PART OF SPEECH LABELLING</Title> <Section position="9" start_page="230" end_page="230" type="concl"> <SectionTitle> 6. CONCLUSION </SectionTitle> <Paragraph position="0"> The most interesting aspect of this work is the implementation and testing of a simple algorithm to learn correction rules from examples. Except for the annotation of text as to the correct data, the process is fully automatic. Even with as little data as we had initially (under 15,000 words), the learned correction rules improved the performance of morphological processing compared to the baseline system. Furthermore, though the original error rate of JUMAN was more than double the rate typically reported for stochastic part-of-speech labellers in English, the result of the correction algorithm plus our hidden Markov model (POST) reduced the error rate to a level comparable with that experienced in English. On the other hand, increasing the training data by a factor of five did not reduce the error rate substantially.</Paragraph> <Paragraph position="1"> The architecture proposed is the morhpological component of the Japanese version of the PLUM data extraction system, and has been tested on more than 300,000 words of text in both a financial domain and a technical domain.</Paragraph> <Paragraph position="2"> Hidden Markov Models, as implementd in POST, were applied to Japanese with relative ease. When additional data becomes available, we would like to test the performance of POST for both word segmentation and labelling part of speech in Japanese.</Paragraph> </Section> class="xml-element"></Paper>