File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/69/c69-0301_abstr.xml
Size: 4,285 bytes
Last Modified: 2025-10-06 13:45:46
<?xml version="1.0" standalone="yes"?> <Paper uid="C69-0301"> <Title>N EXUS A LINGUISTIC TECHNIQUE FOR P RECOORDINATION</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> A method for automatically precoordinating index terms was devised to form combinations of terms which are stored as subject headings. A computer program accepts lists of auto-indexed terms and by applying linguistic and sequence rules combines appropriate terms, thereby effecting improved searchability of an information storage and retrieval system.</Paragraph> <Paragraph position="1"> A serious falling exists in many indexing systems in that index terms authorized for use are too general for use by technically-knowledgeable searchers. A search conducted using these terms frequently produces too many documents not specifically related to the users' requirements. An indexing method using the language in which the document was written corrects this failing, but eliminates the generality of the previous approach. A compromise between indexing generality and specificity is offered by NEXUS precoordination which combines specific terms into subject-headings, eliminating improper coordination of terms when matching search requirements with document term sets.</Paragraph> <Paragraph position="2"> NEXUS examines the suffix morpheme of each input term and determines whether or not the term should be a member of an index term combination or preeoordination. If insufficient evidence is present to make such a determination, a sequence rule goes into effect which combines terms based on their syntax.</Paragraph> <Paragraph position="3"> A variety of corpora was used to test and develop the NEXUS precoordinatot. Data bases consisting of legal information, computer program descriptions and NASA linear tape system documentation were used. More variety was present in the NASA documents which made the results of the application of NEXUS to this collection more significant than the others. Also, a fuller battery of rules was developed by this time, increasing the power of the program. null Summary NEXUS is a research project which is concerned with input processing of natural language for information retrieval.</Paragraph> <Paragraph position="4"> The computer program used to do this task consists of linguistic rules that operate on the suffix portions of printed words, and the order of these words as they appear in a sentence.</Paragraph> <Paragraph position="5"> NEXUS accepts lists of index terms that have resulted from the application of an auto-indexer program to titles and abstracts. Thcsc term lists are processed by NICKUS in order to form combinations of terms which are J stored as subject headings. Such subject headings or precoordinations aid the searcher in finding information when they are used in a bibliographic printout. As opposed to coordinate-indexed printouts, consisting of lists of individual terms and the accession numbers of the source documents, those printouts of NEXUS-processed terms contain word combinations that have been precoordinated, saving time and increasing accuracy for the searcher.</Paragraph> <Paragraph position="6"> It must be stressed that NEXUS operates on general rules. There are occurrences in language that are not covetable by this method. Storage by individual terms is effected in conjunction wRh NEXUS so that nothing is missed because of rule exceptions.</Paragraph> <Paragraph position="7"> Comparison tests have been run using the full NEXUS program, a partial application of the program using sequence rules (SEQS), and human analysis of the same data. Although falling short of human analysis in some respects (except for consistency), the NEXUS approach is more effective than SEQS in producing effective combinations.</Paragraph> <Paragraph position="8"> Although some suggestions arc made for applying this technique along with a possible output format for a bibliographic application, the chief value of this effort, however, has been to further study those aspects of language that are amenable to computerized analysis for the purpose of improving input and output functions in information retrieval.</Paragraph> </Section> class="xml-element"></Paper>