File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/92/h92-1059_abstr.xml
Size: 9,584 bytes
Last Modified: 2025-10-06 13:47:33
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1059"> <Title>SESSION 9: NATURAL LANGUAGE PROCESSINGS</Title> <Section position="1" start_page="0" end_page="297" type="abstr"> <SectionTitle> SESSION 9: NATURAL LANGUAGE PROCESSINGS </SectionTitle> <Paragraph position="0"> Traditional approaches to interpretation in natural language processing typically fall into one of three classes: syntaxdriven, semantics-driven, or frame/task based. Syntax-driven approaches use a domain-independent grammar to drive the interpretation process and produce a global parse of the input, accounting for each word of the sentence.</Paragraph> <Paragraph position="1"> Semantics-driven approaches use knowledge about the case frames of the verbs to drive the interpretation process.</Paragraph> <Paragraph position="2"> Early semantic parsers often ignored syntax altogether \[1, 2\] although more recent systems tend to integrate the two components whether primarily syntax or semantics driven (e.g., \[3\]). Frame or task based parsers use informarion in the underlying domain to guide the parse. Script based parsers are one example of this class \[4\]. A more recent example was presented at last year's DARPA Workshop \[5\]. These systems use the underlying ATIS domain frame that must be built to form a database query to guide the parse, relying on key words and templates to identify information in the sentence that can fill slots of the frame.</Paragraph> <Paragraph position="3"> Any one of these approaches, however, has drawbacks for the spoken language systems and large text understanding systems being developed today. These systems must be robust. Spoken input is often ungrammatical and speakers use words that are unknown to the system. Text understanding systems must be able to process large quantifies of novel text which are likely to contain syntactically complex sentences, ungrammatical sentences, and unknown words. Given the large number of novel sentences that both types of systems encounter, extragranunaticality (i.e., sentences that are grammatical but fall outside the scope of the system grammar) is also an issue. While syntax-driven systems have the advantage of domain independence and provide useful information for further analysis, they are unable to handle ungrammatical sentences since they must produce a complete parse of the sentence. Both semantics-driven and frame-based systems have the advantage of being able to handle ungrammatical and extragranunatical sentences, but they break down on more complex sentences and are not easily transferrable to new domains. All three approaches fail when unknown words are encountered. null The first four papers in this session present three language understanding systems that address these problems. These are the MIT ATIS system (&quot;A Relaxation Method for Understanding Spontaneous Speech Utterances&quot; by SeneffL the BBN DELPHI system (&quot;Fragment Processing in the DELPHI System&quot; by Stallard and Bobrow and &quot;Syntactic/Semantic Coupling in the BBN DELPHI System&quot; by Bobrow, Ingria, and Stallard. These two papers were combined into one presentation), and BBN PLUM (&quot;A New Approach to Text Understanding&quot; by Weischedel, Ayuso, Boisen, Fox, and Ingria). The first two of the systems are spoken language systems, while BBN PLUM is designed to extract dam from text. All four papers include an evaluation of their methods. The final paper in the session presents a new approach to evaluation that does not involve testing through task application.</Paragraph> <Paragraph position="4"> The three systems take a remarkably similar approach to developing robust techniques involving integration of the traditional approaches in a single framework. All three systems are primarily syntax-driven, but have modified their parsers to allow for the production of partial parses, or fragments. BBN DELPHI extracts most likely partial parses from its chart, MIT ATIS allows for relaxation of constraints when a full parse cmmot be produce, and BBN PLUM uses a modified version of a Marcus determhfistie parser where constituents do not need to be attached to a parent node. All three systems use frame or event based knowledge to combine the fragments into a single interpretation. BBN DELPHI and MIT ATIS both use the ATIS frames or templates to guide this task. BBN PLUM uses knowledge of common events in the domain. Integration of semantics (i.e., knowledge of verb case frames) and syntax also plays a role in BBN DELPHI. Semantics is used to reduce the application of plausible syntactic rules by selecting only those rules that produce semantically acceptable interpretations. Case frames are also used to rule out implausible fragments in both BBN DELPHI and BBN PLUM. Finally, both BBN systems also integrate probabilistic language models. For example, statistical models of the likelihood of each syntactic rule are used to select the partial parses that are most likely.</Paragraph> <Paragraph position="5"> MIT ATIS and BBN DELPHI showed through analysis of the DARPA ATIS evaluation that robusqfallback parsing substantially improved their results. BBN PLUM was evaluated through two additional experiments in addition to the MUC-3 evaluation. Their additional experiments showed that recall grows linearly with lexicon size, while precision remains flat. These experiments support their claim that porting to a new domain can be achieved relatively easily.</Paragraph> <Paragraph position="6"> The final paper in this session (&quot;Neal-Montgomery NLP System Evaluation Methodology&quot; by Walter) presents a very controversial new approach to evaluation, as subsequent discussion showed. Waiter's claim is that task-based evaluation methodologies are too man-power intensive, requiring excess expense mad thne when porting to a new domain. Furthermore, due to inadequacies in both the port and in the evaluation metrics, current evaluation methodologies do not reveal an accurate picture of system potential. She reports on an evaluation methodology developed by Neal and Montgomery that provides a descriptive profile of a system's linguistic capabilities.</Paragraph> <Paragraph position="7"> This methodology involves the development of a scorecard, in which each linguistic feature is defined. Systems are then scored against this list of features by a human evaluator who checks whether the system could succesfully produce output when provided with a sentence containing a specific linguistic feature. While Walter indicates that linguistic features can be syntactic, semantic, pragmatic, or lexical, it should be noted that most of the features listed in the example scorecard in the paper are syntactic (e.g., what-questions, what as determiner, what as pronoun, who-questions both with verb and with DO, etc.).</Paragraph> <Paragraph position="8"> The final session discussion focused entirely on the proposed Neal-Montgomery Evaluation techniques, with many pointing out flaws and inconsistencies in the approach. Several points seemed to emerge repeatedly.</Paragraph> <Paragraph position="9"> Many felt that the evaluation could be not be used for system comparison. Systems work on different tasks and different domains. Whether a particular linguistic phenomena can even be tested depends on whether it is used within that domain. For example, Hobbs pointed out that the SRI parser tested using this approach failed on imperatives, despite the fact that its parser had extensive coverage of imperatives. The problem was that imperative sentences were not used in the terrorist domain on which the system now works and therefore the evaluator could not think of an imperative sentence for the test. The wide margin of disagreement among evaluators (20-100%) over whether a given system could or could not handle a given feature was noted and this raised questions about the value of the methodology.</Paragraph> <Paragraph position="10"> Many felt that evaluation of free-grained linguistic phenomena simply could not be done using a black box evaluation. When a sentence fails, it is not possible to tell what caused it. People pointed out that failure could be due to interaction between the linguistic feature being tested and other linguistic features, to other linguistic features in the sentence, or to interaction between linguistic processing and task based processing (e.g., for some tasks it is not necessary to record possessives and thus from the output one cannot tell whether the system handles them).</Paragraph> <Paragraph position="11"> Moreover, success could be due to quirks and ad hoc procedures, thus raising the question of whether black box methodology tests anything at all about syntactic processing. In contrast, people felt that the methodology could be useful for glass box evaluation. Developers could use the check list internally while constructing a parser for intermediate benchmarks. The extensive nature of the list of phenomena identified by Neal and Montgomery was cited as a positive aspect. However, even so, people felt the list does not account for interactions between the linguistic features listed. Many noted that interaction between linguistic phenomena is probably the most difficult part of parser development. There were some who saw the need for a more descriptive approach to evaluation and an appeal was made to involve those who know about evaluation to get involved in order that a good evaluation system could result.</Paragraph> </Section> class="xml-element"></Paper>