File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/99/w99-0612_concl.xml
Size: 1,788 bytes
Last Modified: 2025-10-06 13:58:35
<?xml version="1.0" standalone="yes"?> <Paper uid="W99-0612"> <Title>Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence</Title> <Section position="8" start_page="98" end_page="98" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> This paper has presented an algorithm for the minimally supervised learning of named entity recognizers given short name lists as seed data (typically 40100 example wordS per entity class). The algorithm uses hierarchically ismoothed trie structures for modeling morphological and contextual probabilities effectively in a language independent framework, overcoming the need for fixed token boundaries or history lengths. Th e combination of relatively independent morphological and contextual evidence sources in an iterative bootstrapping framework converges upon a successful inamed entity recognizer, achieving a competitive 70.5%-75.4% F-measure (measuring both named entity identification and classification) when applied to Romanian text. Fixed k-way classification accuracy on given entities ranges between 73%-79% on 5 diverse languages for a difficult firstname/l~stname/place partition, and approaches 92% accuracy for the simpler person/place discrimination. These results were achieved using only unannotated training texts, with absolutely no required language-specific information, tokenizers or other tools, and requiring no more than 15 minutes total human effort in training (for short wordlist creation) The observed robust and consistent performance and very rapid, low cost rampup across 5 quite different languages shows the potential for further successful and diverse applications of this work to new languages and domains.</Paragraph> </Section> class="xml-element"></Paper>