XML Viewer - c00-1084

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/c00-1084_abstr.xml

Size: 1,361 bytes

Last Modified: 2025-10-06 13:41:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1084">
  <Title>Automatic Semantic Sequence Extraction from Unrestricted Non-Tagged Texts</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Mophological processing, syntactic parsing and other useflfl tools have been proposed in the field of natural language processing(NLP). Many of those NLP tools take dictionary-based approaches. Thus these tools are often not very efficient with texts written in casual wordings or texts which contain maw domain-specific terms, because of the lack of vocabulary.</Paragraph>
    <Paragraph position="1"> In this paper we propose a simple method to obtain domain-specific sequences from unrestricted texts using statist;ical information only. This method is language-independent.</Paragraph>
    <Paragraph position="2"> We had experiments oil sequence extraction on email l;exts in Japanese, and succeeded in extracting significant semantic sequences in the test corpus. We tried morphological parsing on the test corpus with ChaSen, a Japanese dictionary-based morphological parser, and examined our system's efficiency in extraction of semantic sequences which were not recognized with ChaSen. Our system detected 69.06% of the unknown words correctly.</Paragraph>
  </Section>
class="xml-element"></Paper>

Download Original XML