File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/p06-2101_relat.xml
Size: 2,337 bytes
Last Modified: 2025-10-06 14:15:59
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2101"> <Title>Optimization Finnish- French- German- Procedure English English English</Title> <Section position="9" start_page="792" end_page="793" type="relat"> <SectionTitle> 7 Related Work </SectionTitle> <Paragraph position="0"> We have seen that annealed minimum risk training provides a useful alternative to maximum likelihood and minimum error training. In our experiments, it never performed significantly worse 11For information on these corpora, see the CoNLL-X shared task on multilingual dependency parsing: http: //nextens.uvt.nl/[?]conll/.</Paragraph> <Paragraph position="1"> sentence test corpora, after training 10 experts on 1000 sentences and fitting their weights th on 200 more. For Slovenian, minimum risk annealing is significantly better than the other training methods, while minimum error is significantly worse. ForBulgarian,bothminimumerrorandannealedminimum risk training achieve significant gains over maximum likelihood, but are indistinguishable from each other. For Dutch, the three methods are indistinguishable.</Paragraph> <Paragraph position="2"> than either and in some cases significantly helped.</Paragraph> <Paragraph position="3"> Note, however, that annealed minimum risk training results in a deterministic classifier just as these other training procedures do. The orthogonal technique of minimum Bayes risk decoding has achieved gains on parsing (Goodman, 1996) and machine translation (Kumar and Byrne, 2004). In speech recognition, researchers have improved decoding by smoothing probability estimates numerically on heldout data in a manner reminiscent of annealing (Goel and Byrne, 2000). We are interested in applying our techniques for approximating nonlinear loss functions to MBR by performing the risk minimization inside the dynamic programming or other decoder.</Paragraph> <Paragraph position="4"> Another training approach that incorporates arbitrary loss functions is found in the structured prediction literature in the margin-based-learning community (Taskar et al., 2004; Crammer et al., 2004). Like other max-margin techniques, these attempt to make the best hypothesis far away from the inferior ones. The distinction is in using a loss function to calculate the required margins.</Paragraph> </Section> class="xml-element"></Paper>