File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/h94-1032_intro.xml

Size: 3,999 bytes

Last Modified: 2025-10-06 14:05:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="H94-1032">
  <Title>PRINCIPLES OF TEMPLATE DESIGN</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> The functionality of systems that extract information from texts can be specified quite simply: the input is a stream of texts and the output is some representation of the information to be extracted. In the message understanding research promoted by ARPA through its Human Language Technology initiative, the form of this output has been templates (feature-structures), with complex path-names (slots) and various constraints on fillers. The design of these templates, especially considered as concrete data structures, has been determined to some degree at least by considerations having to do with automatic scoring. Beyond that, it has not been made clear what principles have driven or should drive the design of these output forms; but it has become clear that serious defects in the form of the output can undermine the utility of an information extraction system. If the output is unusable, or not easily usable, the breadth and reliability of coverage of the natural language analysis component will be of little value.</Paragraph>
    <Paragraph position="1"> As part of the DASH research project on Data Access for Situation Handling, we axe attempting to elucidate principles of template design and at compiling these, with examples, in * a manual for template designers. Our methodology has ineluded detailed critical analysis of the templates from a variety of information extraction tasks (MUC-4, MUG-5, Tipster1, the Waxbreaker Message Handling \[WBMH\] tasks), together with the creation of templates for the TREC topic descriptions and narratives.</Paragraph>
    <Paragraph position="2"> The design of templates, or more generally, abstract data structures, as output forms for automatic information extraction systems must be sensitive to three different but in- null teracting considerations: 1. the template as representational device 2. the template as generated from input 3. the template as input to further processing, by humans  or programs or both.</Paragraph>
    <Paragraph position="3"> The central consideration in our research is that of the template as a representational device. The problem of template design is a special case of the general problem of knowledge representation. In particular, it is the problem of representing, within a constrained formalism, essential facts about situations in a way that can mediate between texts that describe those situations and a variety of applications that involve reasoning about them.</Paragraph>
    <Paragraph position="4"> What facts about a situation are essential is determined by a semantic model of the domain, which is in turn motivated by the particular information requirements of the analytical purposes which the extracted information is to serve. This specification could, in principle, be done without any detailed thought given to the nature of the texts from which information is to be extracted; thus it could include information requirements that simply could not be met by the input stream. It might also abstract from information readily transduced from the input stream. Conversely, the domain specification may reveal cases where one must extract information that is not important to the end user in order to disambiguate or otherwise explicate important informational content. Again, the domain model could be specified without any detailed thought given to the design of the concrete syntax of the template. In this latter regard, crucial considerations include intelligibility and 'browsability', together with the utility of the template fills as input to further processing.</Paragraph>
    <Paragraph position="5"> We here report some results of a program of research ~med at uncovering the underlying principles of template design.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML