File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/h89-1051_intro.xml
Size: 4,710 bytes
Last Modified: 2025-10-06 14:04:49
<?xml version="1.0" standalone="yes"?> <Paper uid="H89-1051"> <Title>PORTING PUNDIT TO THE RESOURCE MANAGEMENT DOMAIN</Title> <Section position="2" start_page="0" end_page="277" type="intro"> <SectionTitle> INTRODUCTION </SectionTitle> <Paragraph position="0"> This paper describes our experiences porting the PUNDIT natural language processing system to the Resource Management domain. PUNDIT has previously been applied to a range of messages (see the paper Analyzing Ezplleitly Structured Discourse in a Limited Domain: Trouble and Failure Reports by C. Ball (appearing in this volume), and also \[Hirschman1989\]. However, it had not not been tested on any significant corpus of queries, such as that represented by the Resource Management corpus. Our goal was to assess PUNDIT's portability, and to determine its coverage of syntax over this domain. Time constraints precluded testing of the semantic component, but we plan to report on this at subsequent meetings. We performed this port with the intention of coupling PUNDIT to the MIT SU1VI1VIIT speech recognition system. This work is described in another paper in this volume, Reducing Search by Partitioning the Word Network, by J. Dowding.</Paragraph> <Paragraph position="1"> Our philosophy in porting has been to tune the system to a new domain, rather than rewriting the grammar or building the grammar from scratch. The rationale for this approach is to continue to develop the coverage of PUNDIT's grammar; each new application should motivate principled extensions to the system that can also apply to other domains. Thus, over time, the coverage of PUNDIT has grown to cover a very large portion of English, and each succeeding port requires less effort. The disadvantage of this approach is that as the coverage grows, the grammar becomes &quot;looser&quot; -- the number of parses for any given word sequence tends to increase and also the grammar tends to overgenerate, letting through constructions that are not grammatical.</Paragraph> <Paragraph position="2"> This philosophy is quite different from the &quot;language modeling&quot; approach taken by some groups working in speech recognition. The language modeling approach has as its goal the development of a minimal covering grammar needed to describe the phenomena observed in the particular corpus. The benefit of the language modeling approach is that it produces a very tight, highly constrained grammar. The disadvantage is the porting cost, and a very fragile system, whose syntactic boundaries are very easy to exceed.</Paragraph> <Paragraph position="3"> Our approach to lexicon development has the same focus as our approach to syntactic coverage: to try to capture the general English definitions, rather than to limit ourselves to the particular domaln-specific usages encountered in the training data. The rationale is also similar to that used in the syntactic component: generation of lexlcal entries is a time-consumlng process; our goal is to develop a broad coverage system, so when entering a word in the lexicon, we enter the general English categories for the word. In many cases, this provides a much more general definition than * This work has been supported in part by DARPA under contract N00014-85-C-0012, administered by the Office of Naval Research; and in part by internal Unisys R&D funding.</Paragraph> <Paragraph position="4"> what is specifically required by an application. For example, the word alert occurs exclusively as a noun in the Resource Management domain. However, it must be classified as an adjective and a verb if the entry is made general to English.</Paragraph> <Paragraph position="5"> The challenge for the broad-coverage grammar/lexlcon approach is to develop methods of tuning the grammar and the lexicon to the particular corpus. It is clear that integration of PUNDIT with a speech recognition system will require that we bring to bear as many constraints as possible, in an attempt to prune the explosive search space that results from indeterminacy in analyzing the acoustic signal. We discuss several possible approaches to tuning both the grammar and the lexicon in the final section of the paper. What these results provide is a solid indication that our porting strategy is successful: only a very modest effort was required to obtain reasonable results in the Resource Management domain (85~ of the training sentences and 76~0 of the test sentences received a correct parse, given a porting effort of 10 person-weeks). The next steps will be to add semantics and pragmatlcs, and to develop techniques for (semi-) automatically tuning the grammar to a new domain.</Paragraph> </Section> class="xml-element"></Paper>