File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/92/p92-1003_evalu.xml
Size: 4,347 bytes
Last Modified: 2025-10-06 14:00:08
<?xml version="1.0" standalone="yes"?> <Paper uid="P92-1003"> <Title>A SIMPLE BUT USEFUL APPROACH TO CONJUNCT IDENTIFICATION 1</Title> <Section position="7" start_page="18" end_page="19" type="evalu"> <SectionTitle> RESULTS AND FUTURE WORK </SectionTitle> <Paragraph position="0"> The algorithm was tested on a 10,000 word chapter of the Merck Veterinary Manual. The results of the tests are shown in Table 1. We are satisfied with these results for the following reasons: (a) The system is being tested on a large body of uncontrolled text from a real domain. (b) The conjunct identification algorithm is domain independent. While the semantic labels produced by the probabilistic labelling system are domain dependent, and the rules for generalizing them to case labels for the noun phrases contain some domain dependencies (there is some evidence, for example, that a noun phrase consisting of a generic noun preceded by a semantically labelled modifier should not always receive the semantic label of the modifier) the conjunct specialist pays attention only to whether the case labels match - not to the actual values of the case labels.</Paragraph> <Paragraph position="1"> (c) The true error rate for the simple conjunct identification algorithm alone is lower than the 18.4% suggested by the table, and making some fairly obvious modifications will make it lower still. The entire system is composed of several components and the errors committed by some portions of the system affect the error rate of the others. A significant proportion of the errors committed by the conjunct identifier are due to incorrect tagging, absence of semantic tags for gerunds, improper parsing, and other matters beyond its control. For example, the fact that gerunds were not marked with the semantic labels attached to nouns has resulted in a situation where any gerund occurring as post-conjunct is preferentially conjoined with any preceding ~eneric noun. More often than not, the gerund should have received a semantic tag and would properly be conjoined to a preceding non-generic noun phrase that would have been of the same semantic type. (The conjunction specialist is not the only portion of the system which would benefit from semantic tags on the gerunds; the system is currently under revision to include them.) From an overall perspective, the conjunct identification algorithm presented above seems to be a very promising one. It does depend a lot upon help received from other components of the system, but that is almost inevitable in a large system. The identification of conjuncts is vital to every NLP system. However, the authors were unable to find references to any current system where success rates were reported for conjunct identification. We believe that the reason behind this could be that most systems handle this problem by breaking it up into smaller parts. They start with a more sophisticated parser that takes care of some of the conjuncts, and then employ some semantic tools to overcome the ambiguities that may still exist due to co-ordinate conjunctions. Since these systems do not have a &quot;specialist&quot; working solely for the purpose of conjunct identification, they do not have any statistic about the success rate for it. Therefore, we are unable to compare our success rates with those of other systems.</Paragraph> <Paragraph position="2"> However, due to the reasons given above, we feel that an 81.6% success rate is satisfactory.</Paragraph> <Paragraph position="3"> We have noted several other modifications that would improve performance of the conjunct specialist. For example, it has been noticed that the coordinate conjunction 'but' behaves sufficiently differently from 'and' and 'or' to warrant a separate set of rules. The current algorithm also ignores lexical parallelism (direct repetition of words already employed in the sentence), which the writers of our text frequently use to override plausible alternate readings. The current algorithm errs in most such contexts. As mentioned above, the algorithm also needs to allow prepositional phrases to be conjoined with adjectives and adverbs in some contexts. Some attempt was made to implement such mixed coordination as a last level of rules, level-4, but it did not meet with a lot of success.</Paragraph> </Section> class="xml-element"></Paper>