File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/88/a88-1013_intro.xml
Size: 3,558 bytes
Last Modified: 2025-10-06 14:04:38
<?xml version="1.0" standalone="yes"?> <Paper uid="A88-1013"> <Title>CN YUR CMPUTR RAED THS?</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 INTRODUCTION </SectionTitle> <Paragraph position="0"> This paper describes a technique for automatic recognition of unknown variants of known words in a natural language processing system. ~Known word&quot; refers here to a word which is in the lexicon. The types of lexical variants which are detectable include inflexional aberrations, ad hoc abbreviations and spelling/typographical errors.</Paragraph> <Paragraph position="1"> The strategies presented here have been implemented fully in an English database query system and play a crucial role in a text-understandlng system which is in the early stages of design. This technique, however, is independent of any particular grammar or parsing formalism, and can be implemented as a lexical lookup routine which heuristically prunes and orders the list of possible fixes found in the lexicon. First, a context-free plausibility assessment is based on a comparison of the structure of each candidate fix with that of the unknown word, and determines the order in which fixes will be considered by the parser. Then, the parsing process can choose among the candidate fixes in the same way that it tests multiple meanings of polysemous words for a good syntactic and semantic fit. The use of heuristics to identify the most plausible fixes for a hypothesized ad hoc abbreviation or spelling error will be the focus of this paper.</Paragraph> <Paragraph position="2"> Unknown words have traditionally been handled by natural language processing systems in the following ways: 1. Query the user for a replacement, possibly offering a menu of spelling corrections. This strategy will allow correction of misspelled words as well as correctly spelled words which are not in the lexicon, and generally ensures an accurate interpretation by the computer.</Paragraph> <Paragraph position="3"> However, continued interaction of this sort may prove frustrating to a poor typist, and is, of course, unsuitable for a non-interactive natural language processor.</Paragraph> <Paragraph position="4"> 2. Enter into a dialogue with the user to provide a definition for a new word. This strategy requires a lexicon interface based on a metalanguage which would specify grammatical properties for a word without necessitating an inordinate degree of linguistic sophistication or knowledge of the database on the part of the end user. Although various attempts have been made to design such interfaces I, many outstanding research issues remain, and this approach too requires an interactive environment. null 3. Try to infer syntactic and/or semantic features of the unknown word from the linguistic context, with no user interaction. This strategy can be used to choose a plausible correction for a misspelled word as well as to parse an expression containing an unknown word. Early research in this area attempted to model human reasoning about unknown words in a script-based parser \[5\], and has since come to encompass a variety of multistrategy, expectation-based techniques as exemplified in the DYPAR \[2\] and NOMAD \[4\] systems. This technique shifts the burden of linguistic expertise from the end user to the computer system, but has met so far with only limited success, and accuracy can only be assured by interaction with the user to XTwo outstanding examples are the TELI \[1\] and TEAM</Paragraph> </Section> class="xml-element"></Paper>