File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-1122_concl.xml

Size: 910 bytes

Last Modified: 2025-10-06 13:54:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1122">
  <Title>An Integrated Method for Chinese Unknown Word Extraction 1</Title>
  <Section position="8" start_page="1" end_page="534" type="concl">
    <SectionTitle>
7 Conclusion
</SectionTitle>
    <Paragraph position="0"> Unknown word recognition is an important problem in CIP systems. Suffix array based method is an efficient method for exact arbitrary-length frequent terms. And most of substring of significant terms, which almost appear in fixed contexts, can be eliminated by Context-entropy values. Large lexicon can help to verify the unknown word doundaris and filter incomplete-boundary n-grams. Most significant informative candidates list on the top of final list according to RFR values for subsequent manual confirmation, and on the other aspect, RFR also reflects the internal character of the extracted terms.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML