File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-1658_concl.xml
Size: 1,240 bytes
Last Modified: 2025-10-06 13:55:43
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1658"> <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Entity Annotation based on Inverse Index Operations</Title> <Section position="8" start_page="499" end_page="499" type="concl"> <SectionTitle> 6 Conclusions </SectionTitle> <Paragraph position="0"> In this paper we demonstrated that a suitably constructed inverse index contains all the necessary information to implement entity annotators that use cascading regular expressions. The approach has the key advantage of not requiring access to the original unstructured data to compute the annotations. The method uses a basic set of operators on the inverse index to construct indexes to all matches for a regular expression in the tokenized data set. We showed theoretically, that for a DFA implementation, the index approach can be much faster if the index sizes corresponding to the labels on the DFA are a small fraction of the total number of tokens in the data set. We also provided a more efficient index-based implementation that is directly computed from the regular expressions without the need of a DFA conversion and experimentally demonstrated the gains.</Paragraph> </Section> class="xml-element"></Paper>