File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0806_concl.xml

Size: 1,554 bytes

Last Modified: 2025-10-06 13:53:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0806">
  <Title>Blueprint for a High Performance NLP Infrastructure</Title>
  <Section position="10" start_page="0" end_page="0" type="concl">
    <SectionTitle>
9 Conclusion
</SectionTitle>
    <Paragraph position="0"> The Generative Programming approach to NLP infrastructure development will allow tools such as sentence boundary detectors, POS taggers, chunkers and named entity recognisers to be rapidly composed from many elemental components. For instance, implementing an efficient version of the MXPOST POS tagger (Ratnaparkhi, 1996) will simply involve composing and configuring the appropriate text file reading component, with the sequential tagging component, the collection of feature extraction components and the maximum entropy model component. null The individual components will provide state of the art accuracy and be highly optimised for both time and space efficiency. A key design feature of this infrastructure is that components share a common representation for text and annotations so there is no time spent reading/writing formatted data (e.g. XML) between stages.</Paragraph>
    <Paragraph position="1"> To make the composition and configuration process easier we have implemented a Python scripting interface, which means that anyone can construct efficient new tools, without the need for much programming experience or a compiler. The development of a graphical user interface on top of the infrastructure will further ease the development cycle.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML