File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1046_metho.xml
Size: 17,337 bytes
Last Modified: 2025-10-06 14:07:09
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1046"> <Title>Automatic Refinement of a POS Tagger Using a Reliable Parser and Plain Text Corpora</Title> <Section position="3" start_page="0" end_page="313" type="metho"> <SectionTitle> APRAS (Automatic POS Rule Acquisition </SectionTitle> <Paragraph position="0"> System), which automatically acquires rnles for refining the morphological analyzer (taggcr) in our English-Japanese MT system ASTRANSAC (Hirakawa ct al. 1991) through the interaction between the system's tagger and parser on the assumption that they are considerably accurate.</Paragraph> <Paragraph position="1"> This paper is organized as follows: Section 2 illustrates the basic idea of our method; Section 3 gives the outline of APRAS; Sections 4 and 5 describe our experiments.</Paragraph> </Section> <Section position="4" start_page="313" end_page="313" type="metho"> <SectionTitle> 2 Basic Idea </SectionTitle> <Paragraph position="0"> Our MT system has a tagger which can generate ranked POS sequences of input sentences according to their plausibility and also a parser which judges the parsability of the derived POS sequences one by one until a parsable one is found ~ . in our framework, this tagger can be viewed as a POS candidate generator, and the parser as a sifter.</Paragraph> <Paragraph position="1"> Now sentences can be categorized into the following three: (P) a balanced sentence, whose top ranked sequence, or initial POS sequence, is parsable, (Q) a conflicting sentence, in which the top ranked scquencc is unparsable, but there are parsable ones in the rest of the sequences; and (R) an unparsable sentence, in which all the POS sequences are unparsable.</Paragraph> <Paragraph position="2"> Before going on to our main discussion, we will briefly explain the terminology used in this paper. Here we call a highest-ranking parsable POS sequence as the &quot;Most Preferable _PParsable POS sequence,&quot; or simply &quot;MPP POS sequence.&quot; For our purposes, we will make use of balanced sentences and conflicting sentences. We call the POS of a word in the initial POS sequence as its &quot;initially tagged POS&quot; and that in the MPP POS sequence as its &quot;parsable POS.&quot; We call the word whose initially tagged POS and parsable POS differ as a &quot;focus word.&quot; Since the tagger is accurate, we can expect only a few POS differences between the initial and MPP POS sequences for a sentence. Finally, let us call Here only top-N POS scquenccs are tried, where N is a pre-defined constant to limit parsing time. the POS's of the preceding and succeeding words as the &quot;POS context of the focus word.&quot; Conflicting sentences, and their initial POS sequences, parsable POS sequences, and focus words can be automatically cxtracted. Through extraction out of a large amount of plain text corpora combined with statistical filtering, it would be possible to automatically select the proper POS conditions that could determine POS's of focus words. Then, we extract &quot;POS over the initially tagged POS in a particular context shown as 'C'.&quot; PA rules do not determine POS's of words from their context, but change the judgement made by the tagger in a particular context. Extracted PA rules arc independent rules to the tagger and the parser used in the extraction. At the samc time, these rules are optimized for the tagger and the parser, since they are derived only from conflicting sentences, not from balanced sentences. Hence, the knowledge already coded in the system will not be extracted.</Paragraph> <Paragraph position="3"> I11 the following section, we give the outline of APRAS focusing on its two modules.</Paragraph> </Section> <Section position="5" start_page="313" end_page="315" type="metho"> <SectionTitle> 3 Outline of APRAS </SectionTitle> <Paragraph position="0"> Fig. 1 shows the application of APRAS to an MT system. APRAS works in two phases, a rule extraction phase and a rule application phase.</Paragraph> <Paragraph position="1"> Note that the same tagger and the parser of the MT system are used throughout.</Paragraph> <Paragraph position="2"> In the rule extraction phase, the tagger analyzes each sentence in a training corpus and produces plausible POS sequences. The parser then judges the parsability of each POS sequence. Whenever a conflicting sentence appears, the role generation module outputs the candidates of PA rules.</Paragraph> <Paragraph position="3"> After all PA rule candidates for this training corpus are generated, the rule filtering module statistically weighs the validity of obtained PA rule candidates, and filters out unreliable rules. Sentences in the training corpus are not mmslated in this phase.</Paragraph> <Paragraph position="4"> In the rule application phase, both the already installed POS rulcs and the acquired PA rules are used for tagging. A sentence is parsed and then translated into target language. PA rules basically act to avoid the taggcr's wasteful generation of POS sequences. This would improve the ranking of POS sequences the tagger outputs and also increase the chances that the parser will find a parsable or better POS sequence in the improved ranking.</Paragraph> <Section position="1" start_page="314" end_page="315" type="sub_section"> <SectionTitle> 3.1 Rule Generation Module </SectionTitle> <Paragraph position="0"> PA rule candidates are generated fronl conflicting sentences. Balanced and unparsable sentences generate no PA rule candidate. The words in balanced sentences arc recorded along with their POS's and POS contexts to be used in the rule filtering module. Whenever the system enconllters a conflicting sentence in a training corpus, the system compares the initial POS sequence with the MPP POS sequence of the scntcncc and picks up focus words. Then, for every focus word, the system generates a PA rule candidate which consists of a focus word, its initially tagged POS, parsable POS, and the POS context, i.e., the preceding POS's and the succeeding POS's.</Paragraph> <Paragraph position="1"> t:ig. 2 illustrates how a PA rulc candidate is generated. The focus word is 'rank', its initially tagged POS is &quot;(verb)', its parsable POS is &quot;(noun)', and the POS context is &quot;(verb)(determiner)-$-'in'-(dcterminer)&quot;, where &quot;$' denotes the focus word. The POS context is composed of preceding two POS's and succeeding two POS's. ttere surface words can be used instead of POS, like &quot;in' in the example. The generated PA rule candidate can be read as: If the word 'rank' appears in a POS context &quot;(verb)-(determiner)-$-'in'-(determiner)&quot;, then give priority to &quot;(noun)' over &quot;(verb)'. In this rule generation module, two important factors should be taken into account: namely, context size and levels of abstraction. If we expand the context of a focus word, the PA rule should gain accuracy. But its frequency in the training corpus would drop, thereby making it difficult to perform statistical filtering. To ensure statistical reliability, we need a large- null sized training corpus. At present we set the context size to be two words.</Paragraph> <Paragraph position="2"> In choosing adequate levels of abstraction or specification of POS in the context, we grouped together those POS tags which influence the choice of POS of a focus word in a similar lnanner as one super-POS tag, as in (Haruno & Matsumoto 1997). We also changed some POS tags for functional words like prepositions and words such as &quot;be' and &quot;have' to tags which denote their literal forms, because the choice of POS of a focus word is highly dependent on the word itsclf. As a result, we obtained 513 POS tags including 16 POS tags for nouns, 17 for verbs, 410 for prepositions and phrasal prepositions, and 70 for adjectives and adverbs.</Paragraph> </Section> <Section position="2" start_page="315" end_page="315" type="sub_section"> <SectionTitle> 3.2 Rule Filtering Module </SectionTitle> <Paragraph position="0"> This section deals with how to statistically filter out inappropriate rules from the generated PA rule candidates. For this purpose, we introduce what we call &quot;adjustment ratios.&quot; Table 1 shows the parsing process of a sentence in which word W appears in POS context C: P1-P2-$-P3-P4. In this context, the word W has two possible POS's, X and Y. Case A shows the case of balanced sentences where the tagger first tagged W with X and the parser found it parsable. Case B shows the case of conflicting sentences where the tagger first tagged W with</Paragraph> </Section> <Section position="3" start_page="315" end_page="315" type="sub_section"> <SectionTitle> Candidate Generation </SectionTitle> <Paragraph position="0"> unparsable X and then with Y which proved to be parsable 2.</Paragraph> <Paragraph position="1"> Let N,, and Nb be the number of semcnces in cases A and B, respectively. Assume the parser is accurate enough to be able to judge a majority of sentences with correct POS contexts to be parsable 3, and those with incorrect POS unparsable.</Paragraph> <Paragraph position="2"> Table 1 : Transition of POS of W in Parsing Process for Context C</Paragraph> <Paragraph position="4"> Then, adjustment ratios can be fornmlated as follows : 2 Here only two possibilities, namely X and Y, arc considered. However it is easy to generalize the transition process for cases where focus words have more than two POS candidates.</Paragraph> </Section> </Section> <Section position="6" start_page="315" end_page="316" type="metho"> <SectionTitle> 3 The accuracy of POS sequences accepted by our </SectionTitle> <Paragraph position="0"> parser is more than 99% (Yoshimura 1995).</Paragraph> <Paragraph position="1"> N b adjustment ratio,:.lc (X --> Y ) -N, +N h When the value is high, the tagger should changc the POS from X to Y, whereas when the value is low, the tagger should not changc the POS in the given context. Thus, based on the statistics of an accurate parscr's judgement, adjustment ratios can be a criterion for the validity of PA rules. The rules whose adjustment ratios are above the threshold are extracted and output as PA rules. The threshold is fixed by examining PA rule candidates as will be lnentioned in the next section. More importantly, PA rules are considered to be &quot;optimized' to the parser. First, the selection and application of inappropriate PA rules do not ilnmcdiately deteriorate the parser output, since PA rules only serve to eliminate wasteful generation of POS sentences. Second, the existence of inappropriate PA rulcs eventually shortens the processing time for those sentences for which the parser produces an errorneous syntactic structure due to a lack of syntactic knowledge.</Paragraph> </Section> <Section position="7" start_page="316" end_page="318" type="metho"> <SectionTitle> 4 Rule Extraction Experiment </SectionTitle> <Paragraph position="0"> We applied the method described in Section 3.2 to English news articles (6,684,848 sentences, 530MB) 4 as a training corpus and obtained 300,438 different PA rule candidates. Since rules with low ficquencics do not have reliable adjustment ratios, we omitted rules with a frequency below 6 and thus obtained 17,731 rules.</Paragraph> <Paragraph position="1"> To verify the validity of adjustment ratio-based rule selection method described ill Section 3.2, we examined some of the obtained PA rules whose frequencies are 10, 20, and 30, referring to the original sentences froln which they were generated, and classified the rules into the following three categories.</Paragraph> <Paragraph position="2"> (p) Valid: applicable to any sentence.</Paragraph> <Paragraph position="3"> (O)lnvalid: inapplicable to every sentence. This type of rule is derived when an incorrect POS sequence was judged to be parsable, due to a lack of coverage of parsing rules in the parser.</Paragraph> <Paragraph position="4"> (R) Undecidable: The derived rule is neither valid nor invalid, either because the POS context or POS specifications are insufficient to uniquely determine the POS of the focus word, or because both the initially tagged POS and the parsable POS are inadequate for the POS context.</Paragraph> <Paragraph position="5"> An example of (3) is: trading(preseut particle) ---> trading(noun): (noun)-'oP-$-(dcterminer)-(noun) The word &quot;trading&quot; is a prcscnt pallicle in sentences like &quot;.. index features represent a more convenient and liquid way of trading all index basket than ...,&quot; while it is a noun ill selltcnecs like &quot;By the close of trading the deal was quoted at 99.82 bid.&quot; Table 2 shows the result of the classification. As is clear in the table, for adjustmcnt ratios bclow 30 %, there arc more invalid rules than valid rulcs, and for adjustment ratios above 30 %, the converse is true. The percentage of invalid rules is small above 60 %.</Paragraph> <Paragraph position="6"> These results prove the validity of our</Paragraph> <Paragraph position="8"> Thus, we eliminated the extracted 17,731 rules whose adjustment ratio are below 60% and obtained 4,494 rules such as : than commas, V=verb (other than past form), VP=verb (past form).</Paragraph> <Paragraph position="9"> 5 Rule Application Experiment By using PA rules, we can expect that: ( P) the process time would be reduced by obtaining a parsable POS sequence at an earlier stage, and (O) both tagging precision and parsing accuracy would improve.</Paragraph> <Paragraph position="10"> To prove the above statements, we applied the 3,921 PA rules 5 extracted in the previous experiment for tagging entirely different English news articles (146,229 sentences; 2.26M Words ) from the training corpus. Among them, 2,421 sentences (1.7%) or 2,476 words (0.11%) satisfied the conditions of these PA rules, which were then tagged and parsed with and without the PA rules. We measured the difference in the elapsed time 6 and the number 5 Out of 4,494, 573 rules have been eliminated in this experiment. These cases involved distinction between compound words (cx. &quot;that is'(adverb)) and non-compound words (ex. &quot;that(pronoun)+is(vex'b)'). This accompanies changes in the window of context, which requires further research.</Paragraph> <Paragraph position="11"> 6 The elapsed time is measured on WorkStation SUN counted of successfully parsed sentences. The result is shown in Table 3. The tagging time was extended by 11.5%, but the parsing time and the total processing time were reduced by 24% and 15.5%, respectively, while the ratio of successfully parsed sentences improved by 8.0%.</Paragraph> <Paragraph position="12"> We also examined 524 POS differences out of all the resulting differences in the tagger's outputs made by the PA rules, and obtained the following figures.</Paragraph> <Paragraph position="13"> - Improved: 411 (78.4%) - Worse: 84 (16.0%) - Neither improved nor worse 29 (5.5%) Out of the 84 worsened cases, 43 were due to invalid rules acquired through wrong parsing because of a lack of sufficient parsing rules. There are highly fiequent expressions characteristic of financial reports which our parser cannot parse. However, again, this kind of invalid rules would not make a significant difference in the final output of the parser. The remaining 32 cases were due to learning fi'om wrongly segmented sets of words and also fiom distinct header expressions like &quot;FT 14 MAY 91 / World News in Brief&quot;. These errors can be easily eliminated by not learning from these data. Adopting the rule accuracy obtained fiom the above examination, we can expect 62.4% (78.4% - 16.0%) improvement for words with PA-rule applied. Since PA-mles are applied to 0.11% of the words in corpus, 0.07% improvement of POS tagging is expected. Wc measured the tagging precision with and without the acquired PA rules for a test corpus containging 5,630 words, and observed that the precision rose to 98.65% fi'om the initial 98.60%, i.e. 0.05% improvement. Since PA rules are lexically based rules, the ratio of sentences which satisfied the rule conditions is rather low, but the number of those sentences would increase in proportion to the number of PA rules acquired.</Paragraph> <Paragraph position="14"> If we expand the size of a training corpus, we could obtain much more PA rules. In fact, we observed many valid rules in the eliminated PA rule candidates whose frequency is immediately Ultra U 1E/200.</Paragraph> <Paragraph position="15"> below the threshold. Since the observed frequency distribution of PA rules was exponential, we can expect PA rules would increase exponentially by expanding the size of a training corpus.</Paragraph> <Paragraph position="16"> This expansion also enables us to specify POS context in detail, like widening the context window, subcategorizing POS tags employed in context, assigning onc surface fimctional word to a lexical tag, etc. To make detailed classification fully effective, we will need to generalize specific rules to the level that reflects the maximum distinction of individual examples.</Paragraph> </Section> class="xml-element"></Paper>