File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-1306_concl.xml
Size: 2,013 bytes
Last Modified: 2025-10-06 13:53:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1306"> <Title>Boosting Precision and Recall of Dictionary-Based Protein Name Recognition</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 7 Conclusion </SectionTitle> <Paragraph position="0"> In this paper we propose a two-phase protein name recognition method. In the first phase, we scan texts for protein name candidates using a protein name dictionary and an approximate string searching technique. In the second phase, we filter the candidates using a machine learning technique.</Paragraph> <Paragraph position="1"> Since our method is dictionary-based, it can provide ID information of recognized terms unlike machine learning based approaches. False recognition, which is a common problem of dictionary-based approaches, is suppressed by a classifier trained on an annotated corpus.</Paragraph> <Paragraph position="2"> Experimental results using the GENIA corpus show that the filtering using a naive Bayes classifier greatly improves precision with slight loss of recall. We achieved an F-measure of 70.2% for protein name recognition on the GENIA corpus.</Paragraph> <Paragraph position="3"> The future direction of this research involves: Use of state-of-the-art classifiers We have used a naive Bayes classifier in our experiments because it requires a small computational resource and exhibits good performance. There is a chance, however, to improve performance by using state-of-the-art machine learning techniques including maximum entropy models and support vector machines.</Paragraph> <Paragraph position="4"> Use of other elastic matching algorithms We have restricted the computation of similarity to edit distance. However, it is not uncommon that the order of the words in a protein name is altered, for example, &quot;beta-1 integrin&quot; &quot;integrin beta-1&quot; The character-level edit distance cannot capture this -kind of similarities.</Paragraph> </Section> class="xml-element"></Paper>