File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-1404_intro.xml

Size: 3,866 bytes

Last Modified: 2025-10-06 14:01:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-1404">
  <Title>Document Structure and Multilingual Authoring</Title>
  <Section position="2" start_page="0" end_page="24" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The world of technical documentation is forcefully moving towards the use of authoring tools based on the XML markup language (W3C, 1998; Pardi, 1999). This language is based on grammatical specifications, called DTD's, which are roughly similar to context-free grammars 1 with an arbitrary number of non-terminals and exactly one predefined terminal called pcdata. The pcdata terminal has a special status: it can dominate any character st, ring (subject to certain restrictions on the characters allowed). Authoring is seen as a. top-down interactive process of step-wise refinement of the root nonterminal (corresponding to the whole document) where the author iteratively selects a rule for expanding a lBut see (Wood, 1995: Prescod, 1998) for discussions of the differences.</Paragraph>
    <Paragraph position="1"> nonterminal already present in the tree and where in addition s/he can choose an arbitrary sequence of characters (roughly) for expanding tile pcdata node. The resulting document is a mixture of tree-like structure (the context-free derivation tree corresponding to the author's selections), represented through tags, and of surface, represented as free-text (PCDATA) between the tags.</Paragraph>
    <Paragraph position="2"> We see however a tension between the structure and surface aspects of an XML document: (r) While structural choices are under system control (they have to be compatible with the DTD), surface choices are not. 2 * Surface strings are treated as unanalysable chunks for the styling mechanisms that render the XML document to the reader. They can be displayed in a given font or moved around, but they lack the internal structure that would permit to &amp;quot;re-purpose&amp;quot; them for different rendering situations, such as displaying on mobile telephone screens, wording differently for a specific audience, or producing prosodically adequate phonetic output. This situation stands in contrast with the underlying philosophy of XML, which emphasizes the separation between content specification and the multiple situations in which this content can be exploited.</Paragraph>
    <Paragraph position="3"> . Structural decisions tend t,o be associated wit, h choices of meaning which are independent of the language in which the document is rendered.</Paragraph>
    <Paragraph position="4"> Thus for instance the DTD for an aircraft maintenance manual might distinguish between two kinds of risks: caution (material damage risk) and warning (risk to the operator). By selecting one of these options (a choice that will lead t,o further-t_owerdevel choices,), the::author takes a decision of a semantic nature, which is quite independent of the language in which the document is to be rendered, and which could be exploited to produce multilingual versions of the 2With the emergenceof schemas (W3C, 1999a), which permit some typing of the surface (float, boolean, string, etc.), some degree of control is becoming more feasible.</Paragraph>
    <Paragraph position="5">  document. By contrast, a PCDATA string is language-specific.and ill-suited for multilingual applications.</Paragraph>
    <Paragraph position="6"> These remarks point to a possible radical view of XML authoring that advocates that surface strings be altogether eliminated from the document content, and that author choices be all under the explicit control of the DTD and reflected in the document structure. Such a view, which is argued for in a related paper (Dymetman et el., 2000), emphasizes the link application of MDA to a certain domain of pharmaceutical documents.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML