File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/a97-1034_concl.xml

Size: 2,037 bytes

Last Modified: 2025-10-06 13:57:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1034">
  <Title>Using SGML as a Basis for Data-Intensive NLP</Title>
  <Section position="8" start_page="234" end_page="234" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> SGML is a good markup language for base level annotations of published corpora. Our experience with LT NSL has shown that: * It is a good system for sequential corpus processing where there is locality of reference.</Paragraph>
    <Paragraph position="1"> * It provides a modular architecture which does not require a central database, thus allowing distributed software development and reuse of components.</Paragraph>
    <Paragraph position="2"> * It works with existing corpora without extensive pre-processing.</Paragraph>
    <Paragraph position="3"> * It does support the Tipster approach of separating base texts from additional markup by means of hyperlinks. In fact SGML (HyTime) allows much more flexible addressing, not just character offsets. This is of benefit when working with corpora which may change.</Paragraph>
    <Paragraph position="4"> LT NSL is not so good for: * Applications which require a database approach, i.e. those which need to access markup at random from a text, for example lexicographic browsing or the creation of book indexes. null * Processing very large plain text or unnormalised SGML corpora, where indexing is required, and generation of normalised files is a large overhead. We are working on extending LT NSL in this direction, e.g. to allow processing of the BNC corpus in its entirety.</Paragraph>
    <Paragraph position="5"> In conclusion, the SGML and database approaches are optimised for different NLP applications and should be seen as complimentary rather than as conflicting. There is no reason why one should not attempt to use the strengths of both the database and the SGML stream approaches. It is recommended that future work should include attention to allowing interfacing between both approaches.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML