File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/p98-2251_concl.xml
Size: 1,301 bytes
Last Modified: 2025-10-06 13:58:17
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2251"> <Title>Predicting Part-of-Speech Information about Unknown Words using Statistical Methods</Title> <Section position="7" start_page="1506" end_page="1506" type="concl"> <SectionTitle> 6 Conclusions and Further Work </SectionTitle> <Paragraph position="0"> The experiments documented in this paper suggest that a tagger can be trained to handle unknown words effectively. By using the probabilistic lexicon, we can predict tags for unknown words based on probabilities estimated from training data, not hand-crafted rules. The modular approach to unknown word prediction allows us to determine what sorts of information are most important.</Paragraph> <Paragraph position="1"> Further work will attempt to improve the accuracy of the predictor, using new knowledge sources. We will explore the use of the concept of a confidence measure, as well as using only infrequently occurring words from the lexicon to train the predictor, which would presumably offer a better approximation of the distribution of an unknown word. We also plan to integrate the predictor into a full HMM tagging system, where it can be tested in real-world applications, using the hidden Markov model to disambiguate problem words.</Paragraph> </Section> class="xml-element"></Paper>