File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/h05-1062_concl.xml
Size: 1,646 bytes
Last Modified: 2025-10-06 13:54:33
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1062"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 491-498, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Robust Named Entity extraction from large spoken archives</Title> <Section position="8" start_page="496" end_page="497" type="concl"> <SectionTitle> 8 Conclusion </SectionTitle> <Paragraph position="0"> We have presented in this paper a robust Named Entity Recognition system dedicated to process ASR transcripts. The FSM-based approach allows us to control the generalization capabilities of the system while the statistical tagger provides good labeling decisions. The main feature of this system is its ability to extract n-best lists of NE hypothesis from word lattices leaving the decision strategy choosing to either emphasize the recall or the precision of the extraction, according to the task targeted. A comparison between this approach and a standard approach based on the NLP tools Lingpipe validates our hypotheses. This integration of the ASR and the NER processes is particularly important in difficult conditions like those that can be found in large spoken archives where the training corpus does not match all the documents to process. A study of the use of metadata information in order to adapt the ASR and NER models to a specific situation showed that if the overall improvement is small, some salient information related to the metadata added can be better extracted by means of this adaptation.</Paragraph> </Section> class="xml-element"></Paper>