File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/92/a92-1021_concl.xml
Size: 2,490 bytes
Last Modified: 2025-10-06 13:56:45
<?xml version="1.0" standalone="yes"?> <Paper uid="A92-1021"> <Title>A Simple Rule-Based Part of Speech Tagger</Title> <Section position="5" start_page="154" end_page="154" type="concl"> <SectionTitle> 4 Conclusions </SectionTitle> <Paragraph position="0"> We have presented a simple part of speech tagger which performs as well as existing stochastic taggers, but has significant advantages over these taggers.</Paragraph> <Paragraph position="1"> The tagger is extremely portable. Many of the higher level procedures used to improve the performance of stochastic taggers would not readily transfer over to a different tag set or genre, and certainly would not transfer over to a different language. Everything except for the proper noun discovery procedure is automatically acquired by the rule-based tagger 7, making it much more portable than a stochastic tagger. If the tagger were trained on a different corpus, a different set of patches suitable for that corpus would be found automatically.</Paragraph> <Paragraph position="2"> Large tables of statistics are not needed for the rule-based tagger. In a stochastic tagger, tens of thousands of lines of statistical information are needed to capture contextual information. This information is usually a table of trigram statistics, indicating for all tags taga, tagb and rage the probability that lagc follows taga and tagb.</Paragraph> <Paragraph position="3"> In the rule-based tagger, contextual information is captured in fewer than eighty rules. This makes for a much more perspicuous tagger, aiding in better understanding and simplifying further development of the tagger. Contextual information is expressed in a much more compact and understandable form. As can be seen from comparing error rates, this compact representation of contextual information is just as effective as the information hidden in the large tables of contextual probabilities.</Paragraph> <Paragraph position="4"> Perhaps the biggest contribution of this work is in demonstrating that the stochastic method is not the only viable approach for part of speech tagging. The fact that the simple rule-based tagger can perform so well should offer encouragement for researchers to further explore rule-based tagging, searching for a better and more expressive set of patch templates and other variations on this simple but effective theme.</Paragraph> <Paragraph position="5"> rAnd even this could be learned by the tagger.</Paragraph> </Section> class="xml-element"></Paper>