File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/p92-1003_intro.xml

Size: 2,760 bytes

Last Modified: 2025-10-06 14:05:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="P92-1003">
  <Title>A SIMPLE BUT USEFUL APPROACH TO CONJUNCT IDENTIFICATION 1</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Identification of the appropriate conjuncts of the coordinate conjunctions in a sentence is fundamental to the understanding of the sentence.</Paragraph>
    <Paragraph position="1"> We use the phrase 'conjunct identification' to refer to the process of identifying the components (words, phrases, clauses) in a sentence that are conjoined by the coordinate conjunctions in it.</Paragraph>
    <Paragraph position="2"> Consider the following sentence: &amp;quot;The president sent a memo to the managers to inform them of the tragic inciden\[ and to request their cooperation.&amp;quot; null In this sentence, the coordinate conjunction 'and' conjoins the infinitive phrases &amp;quot;to inform them of the tragic incident&amp;quot; and &amp;quot;to request their cooperation&amp;quot;. If a natural language understanding system fails to recognize the correct conjuncts, it is likely to misinterpret the sentence or to lose its meaning entirely. The above is an example of a simple sentence where such conjunct identification is easy. In a realistic domain, one encounters sentences which are longer and far more complex.</Paragraph>
    <Paragraph position="3"> 1 This work is supported in part by the National Science Foundation under grant number IRI-9002135. This paper presents an approach to conjunct identification which, while not perfect, gives reasonably good results with a relatively simple algorithm. It is deterministic and domain independent in nature, and is being tested on a large domain - the Merck Veterinary Manual, consisting of over 700,000 words of uncontrolled technical text. Consider this sentence from the manual: &amp;quot;The mites live on the surface of the skin of the ear and canal, and feed by piercing the skin and sucking lymph, with resultant irritation, inflammation, exudation, and crust formation&amp;quot;.</Paragraph>
    <Paragraph position="4"> This sentence has four coordinate conjunctions; identification of their conjuncts is moderately difficult. It is not uncommon to encounter sentences in the manual which are more than twice as long and even more complex.</Paragraph>
    <Paragraph position="5"> The following section briefly describes the larger project of which this research is a part. Then the algorithm used by the authors and its drawbacks are discussed. The last section gives the results obtained when an implementation was run on a 10,000-word excerpt from the manual and discusses some areas for future research.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML