File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/n04-2002_evalu.xml
Size: 1,859 bytes
Last Modified: 2025-10-06 13:59:11
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-2002"> <Title>Identifying Chemical Names in Biomedical Text: An Investigation of the Substring Co-occurrence Based Approaches</Title> <Section position="6" start_page="11" end_page="11" type="evalu"> <SectionTitle> 5 Evaluation and Results </SectionTitle> <Paragraph position="0"> We performed cross validation experiments on 15 hand-annotated MEDLINE abstracts described in section &quot;Available Data&quot;. Experiments were done by holding out each abstract, tuning model parameters on 14 remaining abstracts, and testing on the held out one.</Paragraph> <Paragraph position="1"> Fifteen such experiments were performed. The results of these experiments were combined by taking weighed geometric mean of precision results at each recall level.</Paragraph> <Paragraph position="2"> The results were weighted according to the number of positive examples in each file to ensure equal contribution from each example. Figure 1 shows the resulting precision/recall curves.</Paragraph> <Paragraph position="3"> As we can see, the N-gram approaches perform better than the other ones. The interpolated model with quadratic coefficients needs a lot of development data, so it does not produce good results in our case. Simple Laplacian smoothing needs less development data and produces much better results. The model with confidence based coefficients works best. The graph also shows the model introduced by Wilbur et al., 1999.</Paragraph> <Paragraph position="4"> It does not perform nearly as well on our data, even though it produces very good results on clean data they have used. This (as well as some experiments we performed that have not been included into this work) suggests that quality of the training data has very strong effect on the model results.</Paragraph> </Section> class="xml-element"></Paper>