File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-1122_concl.xml
Size: 910 bytes
Last Modified: 2025-10-06 13:54:17
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1122"> <Title>An Integrated Method for Chinese Unknown Word Extraction 1</Title> <Section position="8" start_page="1" end_page="534" type="concl"> <SectionTitle> 7 Conclusion </SectionTitle> <Paragraph position="0"> Unknown word recognition is an important problem in CIP systems. Suffix array based method is an efficient method for exact arbitrary-length frequent terms. And most of substring of significant terms, which almost appear in fixed contexts, can be eliminated by Context-entropy values. Large lexicon can help to verify the unknown word doundaris and filter incomplete-boundary n-grams. Most significant informative candidates list on the top of final list according to RFR values for subsequent manual confirmation, and on the other aspect, RFR also reflects the internal character of the extracted terms.</Paragraph> </Section> class="xml-element"></Paper>