File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1046_intro.xml
Size: 2,462 bytes
Last Modified: 2025-10-06 14:00:48
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1046"> <Title>Automatic Refinement of a POS Tagger Using a Reliable Parser and Plain Text Corpora</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Much research has been donc Oll knowledge acquisition fiom large-scalc annotated corpora as a rich source of linguistic knowledge. M~tior works done to create English POS taggers (henceforth, &quot;taggers&quot;), for example, include (Church 1988), (Kupicc 1992), (Brill 1992)and (Voutilaincn et al. 1992). The problem with this framework, however, is that such reliable corpora are hardly awdlable duc to a huge amount of the labor-intensive work required. In case of the acquisition of non-core knowledge, such as specific, lexically or dolnain dependent knowledge, preparation of annotated corpora becomes more serious problem.</Paragraph> <Paragraph position="1"> One viable approach then is to utilize plain text corpora instead, as in (Mikheev 1996). But The method proposed by (Mikheev 1996) has its own weaknesses, in that it is restricted in scope. That is, it aims to acquire rules for unknown words in corpora fi'om their ending characters without looking at the context. In the meantime, (Brill 1995a) (Brill 1995b) proposed a method to acquire contcxt-dcpendent POS disambiguation rules and created an accurate tagger, even from a very small aunotated text by combining supervised and tmsupcrviscd learning. The wcakness of his method is that the effect of unsupervised learning decreases as the training corpus size increases.</Paragraph> <Paragraph position="2"> The problem in using plain text corpora for knowledge acquisition is that we need a human supervisor who can evaluate and sift the obtained knowledge. An alternative to this would be to use a number of modules of a well-developed NLP system which stores most of thc highly reliable general rules. Here, one module fimctions as a supervisor for other modules, since all these modules arc designed to work cooperatively and the knowledgcs stored in each module are correlated.</Paragraph> <Paragraph position="3"> Keeping this idea in mind, we propose a new unsupervised learning method for obtaining linguistic rules fi'om plain text corpora using the existing linguistic knowledge. This method has been implemented in the rule extraction system</Paragraph> </Section> class="xml-element"></Paper>