File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/94/a94-1030_concl.xml

Size: 1,133 bytes

Last Modified: 2025-10-06 13:57:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="A94-1030">
  <Title>IMPROVING CHINESE TOKENIZATION WITH LINGUISTIC FILTERS ON STATISTICAL LEXICAL ACQUISITION</Title>
  <Section position="7" start_page="180" end_page="180" type="concl">
    <SectionTitle>
CONCLUSION
</SectionTitle>
    <Paragraph position="0"> We have introduced a blind evaluation method that accommodates multiple standards and gives some indication of how well algorithms' outputs match human preferences.</Paragraph>
    <Paragraph position="1"> We have demonstrated that pure statistically-based lexical acquisition on the same corpus being tokenized can significantly reduce error rates due to unknown words. We also demonstrated empirically the effectiveness of simple morphosyntactic filters in improving the precision of a hybrid statistical/linguistic method for generating new lexical entries. Using linguistic knowledge to construct filters rather than generators has the advantage that applicability conditions do not need to be closely checked, since the training corpus presumably already adheres to any applicability conditions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML