File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0803_concl.xml
Size: 3,068 bytes
Last Modified: 2025-10-06 13:53:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0803"> <Title>OLLIE: On-Line Learning for Information Extraction</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 7 Conclusion </SectionTitle> <Paragraph position="0"> OLLIE is an advanced collaborative annotation environment, which allows users to share and annotate distributed corpora, supported by adaptive information extraction that trains in the background and provides suggestions.</Paragraph> <Paragraph position="1"> The option of sharing access to documents with other users gives several users the possibility to engage in collaborative annotation of documents. For example, one user can annotate a text with organisations, then another annotate it with locations. Because the documents reside on the shared server one user can see errors or questionable markup introduced by another user and initiate a discussion. Such collaborative annotation is useful in the wider context of creating and sharing language resources (Ma et al., 2002).</Paragraph> <Paragraph position="2"> A number of Machine Learning approaches for Information Extraction have been developed recently, e.g., (Collins, 2002; Bikel et al., 1999), including some that use active learning, e.g., (Thompson et al., 1999) or offer automated support, e.g, (Ciravegna et al., 2002), in order to lower the overhead of annotating training data. While there exist corpora used for comparative evaluation, (e.g., MUC or the CMU seminar corpus), there is no easy way to test those ML algorithms on other data, evaluate their portability to new domains, or experiment with different parameters of the models. While some of the algorithms are available for experimentation, they are implemented in different languages, require different data formats, and run on different platforms. All of this makes it hard to ensure experimental repeatability and eliminate site-specific skew effects. Also, since not all systems are freely available, we propose an open, distributed environment where researchers can experiment with different learning methods on their own data.</Paragraph> <Paragraph position="3"> Another advantage of OLLIE is that it defines a simple API (Application Programming Interface) which is used by the different ML algorithms to access the training data (see Section 3.1). Therefore, the integration of a new machine learning algorithm in OLLIE amounts to providing a wrapper that implements this API (a straightforward process). We have already provided a wrapper for the ML algorithms provided by the WEKA toolkit which can be used as an example.</Paragraph> <Paragraph position="4"> Although OLLIE shares features with other adaptive IE environments (e.g., (Ciravegna et al., 2002)) and collaborative annotation tools (e.g., (Ma et al., 2002)), it combines them in a unique fashion. In addition, OLLIE is the only adaptive IE system that allows users to choose which ML approach they want to use and to comparatively evaluate different approaches. null</Paragraph> </Section> class="xml-element"></Paper>