File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/p03-1061_concl.xml

Size: 1,992 bytes

Last Modified: 2025-10-06 13:53:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1061">
  <Title>Satoshi Sekine ++</Title>
  <Section position="8" start_page="93" end_page="93" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> This paper described two methods for detecting word segments and their POS categories in a Japanese spontaneous speech corpus, and describes how to tag a large spontaneous speech corpus accurately by using the two methods. The first method is used to detect any type of word segments. We found that about 80% of unknown words could be semi-automatically detected by using this method. The second method is used when there are several definitions for word segments and their POS categories, and when one type of word segments includes another type of word segments. We found that better accuracy could be achieved by using both methods than by using only the first method alone.</Paragraph>
    <Paragraph position="1"> Two types of word segments, short words and long words, are found in a large spontaneous speech corpus, CSJ. We found that the accuracy of automatic morphological analysis for the short words was 95.79 in F-measure and for long words, 95.49.</Paragraph>
    <Paragraph position="2"> Although the OOV for long words was much higher than that for short words, almost the same accuracy was achieved for both types of words by using our proposed methods. We also found that we can expect more than 99% of precision for short words, and 97% for long words found in the whole corpus when we examined 10% of output morphemes in ascending order of their probabilities as estimated by the proposed models.</Paragraph>
    <Paragraph position="3"> In our experiments, only the information contained in the corpus was used; however, more appropriate linguistic knowledge than that could be used, such as morphemic and syntactic rules. We would like to investigate whether such linguistic knowledge contributes to improved accuracy.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML