File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/p03-1027_concl.xml

Size: 4,171 bytes

Last Modified: 2025-10-06 13:53:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1027">
  <Title>Machine Learning Tools and Techniques with Java</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> The most significant challenge facing developers of large-scale lexical-semantic resources is coming to some agreement on the way that natural language can be mapped onto specific concepts. This challenge is particularly evident in consideration of our survey data and subsequent filtering. The abilities that people have in producing and recognizing sentences containing related words or phrases differed significantly across concept areas.</Paragraph>
    <Paragraph position="1"> While raters could agree on what constitutes a sentence containing an expression about memory (Kappa=.8069), the agreement on expressions of managing knowledge is much lower than we would hope for (Kappa=.5636). We would expect much greater inter-rater agreement if we had trained our six raters for the filtering task, that is, described exactly which concepts we were looking for and gave them examples of how these concepts can be realized in English text. However, this approach would have invalidated our performance results on the filtered data set, as the task of the raters would be biased toward identifying examples that our system would likely perform well on rather than identifying references to concepts of commonsense psychology.</Paragraph>
    <Paragraph position="2"> Our inter-rater agreement concern is indicative of a larger problem in the construction of large-scale lexical-semantic resources. The deeper we delve into the meaning of natural language, the less we are likely to find strong agreement among untrained people concerning the particular concepts that are expressed in any given text. Even with lexical-semantic resources about commonsense knowledge (e.g. commonsense psychology), finer distinctions in meaning will require the efforts of trained knowledge engineers to successfully map between language and concepts. While this will certainly create a problem for future preci- null grammars (A), SVMs with word features (B), and SVMs with word and concept features (C) sion/recall performance evaluations, the concern is even more serious for other methodologies that rely on large amounts of hand-tagged text data to create the recognition rules in the first place. We expect that this problem will become more evident as projects using algorithms to induce local grammars from manually-tagged corpora, such as the Berkeley FrameNet efforts (Baker et al., 1998), broaden and deepen their encodings in conceptual areas that are more abstract (e.g. commonsense psychology).</Paragraph>
    <Paragraph position="3"> The approach that we have taken in our research does not offer a solution to the growing problem of evaluating lexical-semantic resources.</Paragraph>
    <Paragraph position="4"> However, by hand-authoring local grammars for specific concepts rather than inducing them from tagged text, we have demonstrated a successful methodology for creating lexical-semantic resources with a high degree of conceptual breadth and depth. By employing linguistic and knowledge engineering skills in a combined manner we have been able to make strong ontological commitments about the meaning of an important portion of the English language. We have demonstrated that the precision and recall performance of this approach is high, achieving classification performance greater than that of standard machine-learning techniques. Furthermore, we have shown that hand-authored local grammars can be used to identify concepts that can be easily combined with word-level features (e.g. unigrams, bi-grams) for integration into statistical natural language processing systems. Our early exploration of the application of this work for corpus analysis (U.S. State of the Union Addresses) has produced interesting results, and we expect that the continued development of this resource will be important to the success of future corpus analysis and human-computer interaction projects.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML