File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/p03-2012_concl.xml

Size: 3,198 bytes

Last Modified: 2025-10-06 13:53:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-2012">
  <Title>High-precision Identification of Discourse New and Unique Noun Phrases</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusion and Future Work
</SectionTitle>
    <Paragraph position="0"> We have implemented a system for automatic identification of discourse new and unique entities. To learn the classification we use a small training corpus (MUC-7). However, much bigger corpus (the WWW, as indexed by AltaVista) is used to obtain values for some features. Combining heuristics and Internet counts we are able to achieve 88.9% precision and 84.6% recall for discourse new entities.</Paragraph>
    <Paragraph position="1"> Our system can also reliably classify NPs as a0 a11 a16a20a3a22a21a12a11a23a15 a33a6a5 a13a4a15a8a7a20a15 a13a52a13a12a3a22a16a10a9 . The accuracy of this classification is about 89-92% with various precision/recall combinations. The classifier provide useful information for coreference resolution in general, as a55 a11a14a16a20a3a22a21a12a11 a15 and a4 a11 a16a20a3a22a21a12a11a23a15 descriptions exhibit different behaviour w.r.t. the anaphoricity. This fact is partially reflected by the performance of our sequential classifier (table 3): the context information is not sufficient to determine whether a unique NP is a first-mention or not, one has to develop sophisticated names matching techniques instead.</Paragraph>
    <Paragraph position="2"> We expect our algorithms to improve both the speed and the performance of the main coreference resolution module: once many NPs are discarded, the system can proceed quicker and make fewer mistakes (for example, almost all the parsing errors were classified by our algorithm as a55 a1a54a3a48a5a8a7a10a9a12a11a14a13a4a5a12a15 a16a17a15a19a18 ). Some issues are still open. First, we need sophisticated rules to compare unique expressions. At the present stage our system looks only for full matches and for same head expressions. Thus, &amp;quot;China and Taiwan&amp;quot; and &amp;quot;Taiwan&amp;quot; (or &amp;quot;China&amp;quot;, depending on the rules one uses for coordinates' heads) have much better chances to be considered coreferent, than &amp;quot;World Trade Organisation&amp;quot; and &amp;quot;WTO&amp;quot;.</Paragraph>
    <Paragraph position="3"> We also plan to conduct more experiments on the interaction between the a0a2a1a4a3a6a5a8a7a10a9a12a11a14a13a4a5a12a15 a16a17a15a19a18 and a0 a11 a16a20a3a22a21a12a11a23a15 classifications, treating, for example, time expressions as a4 a11a14a16a20a3a22a21a12a11 a15 , or exploring the influence of various optimisation strategies for a0 a11a14a16a20a3a22a21a12a11 a15a8a5 on the overall performance of the sequential classifier.</Paragraph>
    <Paragraph position="4"> Finally, we still have to estimate the impact of our pre-filtering algorithm on the overall coreference resolution performance. Although we expect the coreference resolution system to benefit from the a0a2a1a54a3a48a5a8a7a10a9a12a11a14a13a4a5a12a15 a16a17a15a19a18 and a0a2a11a14a16a20a3a22a21a12a11a23a15 classifiers, this hypothesis has to be verified.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML