File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-3237_concl.xml

Size: 1,937 bytes

Last Modified: 2025-10-06 13:54:32

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3237">
  <Title>Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lot</Title>
  <Section position="9" start_page="6" end_page="6" type="concl">
    <SectionTitle>
6 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> The MEMM tagger is very effective in reducing both in-domain and out-of-domain capitalization error by 35%-45% relative over a 1-gram capitalization model.</Paragraph>
    <Paragraph position="1"> We have also presented a general technique for adapting MaxEnt probability models. It was shown to be very effective in adapting a background MEMM capitalization model, improving the accuracy by 20-25% relative. An overall 50-60% reduction in capitalization error over the standard 1-gram baseline is achieved. A surprising result is that the adaptation performance gain is not due to adding more, domain-specific features but rather making better use of the background features for modeling the in-domain data.</Paragraph>
    <Paragraph position="2"> As expected, adding more background training data improves performance but a very small amount of domain specific data also helps significantly if one can make use of it in an effective way. The &amp;quot;There's no data like more data&amp;quot; rule-of-thumb could be amended by &amp;quot;..., especially if it's the right data!&amp;quot;.</Paragraph>
    <Paragraph position="3"> As future work we plan to investigate the best way to blend increasing amounts of less-specific background training data with specific, in-domain data for this and other problems.</Paragraph>
    <Paragraph position="4"> Another interesting research direction is to explore the usefulness of the MAP adaptation of Max-Ent models for other problems among which we wish to include language modeling, part-of-speech tagging, parsing, machine translation, information extraction, text routing.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML