File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-3237_concl.xml
Size: 1,937 bytes
Last Modified: 2025-10-06 13:54:32
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3237"> <Title>Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lot</Title> <Section position="9" start_page="6" end_page="6" type="concl"> <SectionTitle> 6 Conclusions and Future Work </SectionTitle> <Paragraph position="0"> The MEMM tagger is very effective in reducing both in-domain and out-of-domain capitalization error by 35%-45% relative over a 1-gram capitalization model.</Paragraph> <Paragraph position="1"> We have also presented a general technique for adapting MaxEnt probability models. It was shown to be very effective in adapting a background MEMM capitalization model, improving the accuracy by 20-25% relative. An overall 50-60% reduction in capitalization error over the standard 1-gram baseline is achieved. A surprising result is that the adaptation performance gain is not due to adding more, domain-specific features but rather making better use of the background features for modeling the in-domain data.</Paragraph> <Paragraph position="2"> As expected, adding more background training data improves performance but a very small amount of domain specific data also helps significantly if one can make use of it in an effective way. The &quot;There's no data like more data&quot; rule-of-thumb could be amended by &quot;..., especially if it's the right data!&quot;.</Paragraph> <Paragraph position="3"> As future work we plan to investigate the best way to blend increasing amounts of less-specific background training data with specific, in-domain data for this and other problems.</Paragraph> <Paragraph position="4"> Another interesting research direction is to explore the usefulness of the MAP adaptation of Max-Ent models for other problems among which we wish to include language modeling, part-of-speech tagging, parsing, machine translation, information extraction, text routing.</Paragraph> </Section> class="xml-element"></Paper>