File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0803_intro.xml
Size: 2,303 bytes
Last Modified: 2025-10-06 14:01:56
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0803"> <Title>OLLIE: On-Line Learning for Information Extraction</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> OLLIE is an on-line application for corpus annotation that harnesses the power of Machine Learning (ML) and Information Extraction (IE) in order to make the annotator's task easier and more efficient.</Paragraph> <Paragraph position="1"> A normal OLLIE working session starts with the user uploading a set of documents, selecting which ML method to use from the several supplied by the system, choosing the parameters for the learning module and starting to annotate the texts. During the initial phase of the manual annotation process, the system learns in the background (i.e. on the server) from the user's actions and, when a certain degree of confidence is reached, it starts making suggestions by pre-annotating the documents. Initially, some of these suggestions may be erroneous but, as the user makes the necessary corrections, the system will learn from its mistakes and the performance will increase leading to a reduction in the amount of human input required.</Paragraph> <Paragraph position="2"> The implementation is based on a client-server architecture where the client is any Java-enabled web browser and the server is responsible for storing data, training ML models and providing access services for the users.</Paragraph> <Paragraph position="3"> The client side of OLLIE is implemented as a set of Java Server Pages (JSPs) and a small number of Java applets are used for tasks where the user interface capabilities provided by HTML are not enough. The server side comprises a JSP/servlet server, a relational database server and an instance of the GATE architecture for language engineering which is used for driving all the language-related processing. The general architecture is presented in Figure The next section describes the client side of the OLLIE system while Section 3 details the implementation of the server with a subsection on the integration of Machine Learning. Section 4 talks about security; Section 6 about future improvements and Section 7 concludes the paper.</Paragraph> </Section> class="xml-element"></Paper>