File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/a97-1027_intro.xml

Size: 3,368 bytes

Last Modified: 2025-10-06 14:06:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1027">
  <Title>Dutch Sublanguage Semantic Tagging combined with Mark-Up Technology</Title>
  <Section position="4" start_page="0" end_page="182" type="intro">
    <SectionTitle>
2 Material and Methods
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="182" type="sub_section">
      <SectionTitle>
2.1 The Linguistic String Project - Medical
Language Processor
</SectionTitle>
      <Paragraph position="0"> The Linguistic String Project. - Medical Language Processor (LSP-MLP) of the New York University  is the first (and up till now the longest lasting) large scale project about NLP in Medicine (Sager et al., 1987), (Sager et al., 1995a). The LSP-MLP has also been ported to French and German, which illustrates the general applicability of its methodology and approach (Nhkn et al., 1989), (Oliver, 1992). The reason of its generality lies in the use of a well defined underlying linguistic theory (distributionalism) (Harris, 1962), (Sager et al., 1981) and a scientifically based sublanguage approach (Grishman and Kittredge, 1986).</Paragraph>
      <Paragraph position="1"> Important for the present discussion is the semantic selection level of the LSP-MLP. All the words in the LSP dictionary are characterised by labels that indicate to which sublanguage word class(es) the words belong (e.g., H-TTCHIR: &amp;quot;contains general and specific surgical treatment or procedure words which imply or denote surgical intervention by the physician&amp;quot; (Sager et al., 1987, p.268); H-TXPlZOC: &amp;quot;contains medical test words designating procedures performed on the patient and not on a patient speciment. The patient must be present to undergo the test&amp;quot; (Sager et al., 1987, p.264) ). An overview of the actual set of labels and word classes can be found in (Sager et al., 1995a). The semantic selection module uses distributionally established co-occurrence patterns of medical word classes to improve the parse tree by resolving cases of structural ambiguity (Hirschman, 1986). Consider the sentence 63 &amp;quot;operatieve procedure: vijfvoudige coronaire bypass.&amp;quot; 1 displayed in figure 4. The word &amp;quot;procedure&amp;quot; is semantically ambiguous because it has two semantic labels: H-TTCHIR ~: H-TXPROC. Thanks to the co-occurrence patterns for the medical sublanguage, only the label that is valid in this context (H-TTCHIR) is ultimately selected. In another context (e.g.: test procedure: ...), another co-occurrence pattern will apply and select the H-TXPROC reading.</Paragraph>
      <Paragraph position="2"> Other examples of resolution of word sense ambiguities by means of co-occurrence patterns can be found in (Sager et al., 1987, pp.83, 95).</Paragraph>
      <Paragraph position="3"> The very latest work includes the use of Standard Generalized Mark-up Language (SGML) and World Wide Web (WWW) Graphical User Interface (GUI) technology to access and visualise better the requested information in the text (Sager et al., 1996).</Paragraph>
      <Paragraph position="4"> It focused on the use of static SGML or HTML-code 2 for displaying the results of NLP-based checklist screening of clinical documents.</Paragraph>
      <Paragraph position="5"> 1English: surgical procedure: quintuple coronary bypass.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML