File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/h91-1065_intro.xml
Size: 2,081 bytes
Last Modified: 2025-10-06 14:05:01
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1065"> <Title>STUDIES IN PART OF SPEECH LABELLING</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION 1 </SectionTitle> <Paragraph position="0"> Natural language processing, and AI in general, have focused mainly on building rule-based systems with carefully handcrafted rules and domain knowledge. Our own natural language database query systems, JANUS 2, Parlance ruz, and Delphi 4, use these techniques quite successfully. However, as we move from the problem of understanding queries in fixed domains to processing open text for applications such as data extraction, we have found rule-based techniques too brittle, and the amount of work necessary to build them intractable, especially when attempting to use the same system on multiple domains.</Paragraph> <Paragraph position="1"> We report in this paper on one application of probabilistic models to language processing, the assignment of part of speech to words in open text. The effectiveness of such models is well known (Church 1988) and they are currently in use in parsers (e.g. de Marcken 1990). Our work is an incremental improvement on these models in two ways: (1) We have run experiments regarding the amount of training data needed in moving to a new domain; (2) we have added probabilistic models of word features to handle unknown words effectively. We describe POST and its algorithms and then we describe our extensions, showing the results of our experiments.</Paragraph> <Paragraph position="2"> conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, whether expressed or implied, of the Defense Advanced Research Projects Agency or the United States Government. 2 Weischedel, et al. 1989.</Paragraph> <Paragraph position="3"> 3 Parlance is a trademark of BBN Systems and Technologies. 4 Stallard, 1989.</Paragraph> <Paragraph position="4"> 2. POST: USING PROBABILITIES TO</Paragraph> </Section> class="xml-element"></Paper>