File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/94/h94-1072_concl.xml
Size: 2,078 bytes
Last Modified: 2025-10-06 13:57:22
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1072"> <Title>DOCUMENT REPRESENTATION IN NATURAL LANGUAGE TEXT RETRIEVAL</Title> <Section position="8" start_page="366" end_page="368" type="concl"> <SectionTitle> 6. CONCLUSIONS </SectionTitle> <Paragraph position="0"> We presented some detail of our natural language information retrieval system consisting of an advanced NLP module and a 'pure' statistical core engine. While many problems remain to be resolved, including the question of adequacy of term-based representation of document content, we attempted to demonstrate that the architecture described here is nonetheless viable. We demonstrated that natural language processing can now be done on a fairly large scale and that its speed and robustness can match those of traditional statistical programs such as key-word indexing or statistical phrase extraction. We suggest moreover that when properly used natural language processing can be very effective in improving retrieval precision. In particular, we show that in term-based document representation, term weighting is at least as important as their selection. In order to achieve optimal performance terms obtained primarily through the linguistic analysis must be weighted differently than those obtained through traditional frequency-based methods.</Paragraph> <Paragraph position="1"> On the other hand, we must be aware of the limits of NLP technologies at our disposal. While part-of-speech tagging, lexicon-based stemming, and parsing can be done on large amounts of text (hundreds of millions of words and more), other, more advanced ranking for Topic 72.</Paragraph> <Paragraph position="2"> processing involving conceptual structuring, logical forms, etc., is still beyond reach, computationally. It may be assumed that these super-advanced techniques will prove even more effective, since they address the problem of representation-level limits; however the experimental evidence is sparse and necessarily limited to rather small scale tests (e.g., \[ 13\]).</Paragraph> </Section> class="xml-element"></Paper>