File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/a97-1027_metho.xml

Size: 8,006 bytes

Last Modified: 2025-10-06 14:14:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1027">
  <Title>Dutch Sublanguage Semantic Tagging combined with Mark-Up Technology</Title>
  <Section position="5" start_page="182" end_page="185" type="metho">
    <SectionTitle>
2 &amp;quot;Static&amp;quot; HTML code eliminates the need for an on
</SectionTitle>
    <Paragraph position="0"> the fly conversion of the HTML file (&amp;quot;dynamic&amp;quot; HTML code) as presented in section 2.4.</Paragraph>
    <Section position="1" start_page="182" end_page="182" type="sub_section">
      <SectionTitle>
2.2 The Dutch Medical Language Processor
</SectionTitle>
      <Paragraph position="0"> For the Dutch medical language, an NLP system of a medium sized coverage has been designed and implemented: the Dutch Medical Language Processor (DMLP) (Spyns, 1996c). With respect to the morphological level, there is a full form dictionary stored in the relational database format (currently some 100.000 full forms that are mostly non-compound wordforms) (Dehaspe, 1993). If necessary, a recogniser characterises the unknown word forms morphologically (Spyns, 1994). Subsequently, a contextual disambiguation component tries to reduce the number of morphological readings (Spyns, 1995).</Paragraph>
      <Paragraph position="1"> As the syntactic level uses a &amp;quot;logic variant&amp;quot; of the LSP grammar formalism (Hirschman and Dowding, 1990), the Dutch morpho-syntactic module (Spyns and Adriaens, 1992) can replace the LSP parser.</Paragraph>
      <Paragraph position="2"> Many of the LSP-MLP medical co-occurrence patterns are practically identical for English, French and German, so that the application of these patterns to Dutch parse trees can lead to interesting results, namely the feasibility of reusing the non language specific parts of the LSP-MLP for Dutch medical NLP (Spyns, 1996a).</Paragraph>
    </Section>
    <Section position="2" start_page="182" end_page="182" type="sub_section">
      <SectionTitle>
2.3 The DMLP/LSP-MLP connection
</SectionTitle>
      <Paragraph position="0"> The linguistic data are passed on from the DMLP to the LSP-MLP system via syntactic parse trees. This is due to the fact that the selection module takes syntactic relationships into account during the semantic disambiguating phase.</Paragraph>
      <Paragraph position="1"> The linguistic information of the DMLP and the LSP-MLP systems correspond in a high degree. Semantic word class labels, which were originally not foreseen in the Dutch lexicon, had to be added. A parse tree transducer delivers nearly genuine Dutch LSP-MLP trees (Spyns, 1996a). Although on the side of the LSP-MLP some new sublanguage semantic co-occurrence patterns had to be defined, the co-occurrence patterns are highly language independent. This was in line with results earlier achieved. An example (see figure 1) shows the output of the parse tree transducer that reshapes the DMLP tree into the required LSP-MLP format. The current state of the transducer allows to transform nearly all the parse trees.</Paragraph>
    </Section>
    <Section position="3" start_page="182" end_page="185" type="sub_section">
      <SectionTitle>
2.4 The WWW interface
</SectionTitle>
      <Paragraph position="0"> The basic idea was that when treating a patient, it is considered to be helpful to reread the admission history, the discharge summary, or other important parts of the medical record.</Paragraph>
      <Paragraph position="1">  coronaire bypass.&amp;quot; \[surgical procedure: quintuple coronary bypass\] The highlighting of medical concepts of interest makes it possible to scan a document quickly, focusing on a particular type of information, such as Symptoms and Diagnoses, or Treatments resolved (?, p.26).</Paragraph>
      <Paragraph position="2"> Also for the medico-administrative activities, such a tool can also be helpful. Medical secretaries have to summarise patient discharge summaries by &amp;quot;translating&amp;quot; them into a fixed set of numerical codes of a classification (ICD-9-CM (Commission of Professional and Hospital Activities, 1978)). These codes (indirectly) serve for statistical and financial purposes. If the most important relevant terms for the encoding task (essentially the H-DIAG (diagnosis) and the H-TTCHIR (surgical deed) words) are already highlighted, the human encoder is able to detect them more rapidly so that the encoding speed can be improved.</Paragraph>
      <Paragraph position="3"> The documents are morphologically and syntactically analysed by the DMLP first, the resulting parse trees being made conform to the LSP-format, and subsequently passed 3 on to the LSP-MLP.</Paragraph>
      <Paragraph position="4"> The LSP subselection module generates a pseudo-HTML file consisting of semantic labels and the terminal elements of the parse trees. The file with the pseudo-HTML codes (see figure 3) could easily have been generated by the morphological component of the DMLP as well. In some occasions, it would be better to do so as the DMLP-LSP tree converter sometimes changes the word order. On the other hand, no advantage can then be taken from the sublanguage co-occurrence patterns for semantic disambiguation. Semantically ambiguous words will thus be highlighted more than once, which is bad for the precision score (more non relevant words are flagged). Without full fledged linguistic analysis, some ambiguities will not be resolved (?, p.27). As can be seen in figure 2 (and thus also in figure 3), the ambiguity for the word &amp;quot;procedure&amp;quot; in sentence 63 is resolved. The node number 2 only has the label H-TTCHIR.</Paragraph>
      <Paragraph position="5"> 3Currently, the files are transmitted by e-mail.</Paragraph>
      <Paragraph position="6">  No actual HTML-codes were furnished but the semantic labels are noted according to the HTML-style (see figure 3). The NLP processing of a load of PDSs can be done in batch during the night so that the throughput of the encoder is not affected in the negative sense.</Paragraph>
      <Paragraph position="7">  DMLP/LSP-MLP processing for the sentence in figure 1 The GUI consists of two WWW-pages. The first page is conceived as a menu window. Two selection boxes allow the medical encoder to choose a text and the semantic labels. Currently, the set of PDSs is limited to nine texts. In the future, HTML-files for an unrestricted and varying number of PDSs will have to be produced. Before the encoder can start to view the NLP-processed PDSs, the HTML-code of the menu-page needs to be updated to include all the (path)names of the files concerned. This can easily be achieved by activating before each encoding session a C-shell script that scans a subdirectory and creates an actualised HTML-file for the menu page. Only the &amp;quot; &lt; OPTION &gt;&lt;/OPTION &gt;&amp;quot; lines of the first choice box need to be adapted.</Paragraph>
      <Paragraph position="8"> Through the HTML SUBMIT command, the options selected by the medical encoder are passed (via a FORM and CGI-SCRIPT) to an external C-program.</Paragraph>
      <Paragraph position="9"> The C-program takes the filename and the requested sublanguage label(s) as parameters and generates  a new HTML-file by replacing the occurrences of the concerned label(s) by a genuine HTML-code (&lt; STRONG &gt; &amp; &lt;/STRONG &gt;) around the relevant words). This temporary file is directly fed into the browser and displayed as a second WWW-page (&amp;quot;PDS-page&amp;quot;). The words marked (= belonging to the selected semantic sublanguage word class) are displayed in boldface. As the pseudo-HTML codes are ignored by the browser, the rest of the PDS is displayed in a &amp;quot;neutral&amp;quot; way.</Paragraph>
      <Paragraph position="10"> Figure 4 shows the menu-page and PDS-page in which words concerning the diagnosis (H-DIAG), the surgical procedure (tt-TTCHIR) and the bodypart (H-PTPART) are marked. The PDS-page is the bottom right part of the figure and partly overlaps the menupage, which shows the selected PDS and labels 4.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML