File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/m93-1010_intro.xml

Size: 2,507 bytes

Last Modified: 2025-10-06 14:05:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="M93-1010">
  <Title>Fl : &amp;quot;BRIDGESTONE SPORTS CO . SAID FRIDAY IT HAS SET UP A JOINT VENTURE &amp;quot; (S (NP (N (NAME &amp;quot;BRIDGESTONE SPORTS CO .&amp;quot;))) (VP (AUX )</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
PROCESSING STAGE S
</SectionTitle>
    <Paragraph position="0"> The PLUM architecture is presented in Figure 1 . Ovals represent declarative knowledge bases ; rectangles represen t processing modules . A more detailed description of the system components, their individual outputs, and thei r knowledge bases is presented in Ayuso et al ., [1]. The processing modules are briefly described below .</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Message Reader
</SectionTitle>
      <Paragraph position="0"> This module is like the &amp;quot;text zoner&amp;quot; of Hobbs' description of generic data extration systems . PLUM' s specification of the input format is a declarative component of the message reader, allowing the system to be easily adapted to handle different formats . The input to the PLUM system is a file containing one or more messages . The message reader module determines message boundaries, identifies the message header information, and determine s paragraph and sentence boundaries . To date, we have designed format specifications for about half a dozen domains .</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Morphological Analyzer
</SectionTitle>
      <Paragraph position="0"> The first phase of processing is assignment of part-of-speech information, e .g., proper noun, verb, adjective, etc .</Paragraph>
      <Paragraph position="1"> In BBN's part-of-speech tagger POST [5], a bi-gram probability model, frequency models for known words (derive d from large corpora), and probabilities based on word endings for unknown words are employed to assign part of speech to the highly ambiguous words and unknown words of the corpus . POST tags each word with one of 47 possible tags with 97% accuracy for known words . For the Japanese domains, JUMAN is used to propose wor d segmentation and part-of-speech assignments, which are then corrected by AMED [3] before being handed to POS T for final disambiguation. Below are the part-of-speech tags produced by POST for the first sentence of the EJV walkthrough article 0592:</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML