XML Viewer - p98-2180

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/p98-2180_concl.xml
Size: 4,602 bytes
Last Modified: 2025-10-06 13:58:09
<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2180">
  <Title>MindNet: acquiring and structuring semantic information from text</Title>
  <Section position="10" start_page="1100" end_page="1101" type="concl">
    <SectionTitle>
9 Disambiguating MindNet
</SectionTitle>
    <Paragraph position="0"> An additional level of processing during the creation of MindNet seeks to provide sense identifiers on the words of semrel structures. Typically, word sense disambiguation (WSD) occurs during the parsing of definitions and example sentences, following the construction of logical forms (see Braden-Harder, 1993). Detailed information from the parse, both morphological and syntactic, sharply reduces the range of senses that can be plausibly assigned to each word.</Paragraph>
    <Paragraph position="1"> Other aspects of dictionary structure are also exploited, including domain information associated with particular senses (e.g., Baseball).</Paragraph>
    <Paragraph position="2"> In processing normal input text outside of the context of MindNet creation, WSD relies crucially on information from MindNet about how word senses are linked to one another. To help mitigate this bootstrapping problem during the initial construction of MindNet, we have experimented with a two-pass approach to WSD.</Paragraph>
    <Paragraph position="3"> During a first pass, a version of MindNet that does not include WSD is constructed. The result is a semantic network that nonetheless contains a great deal of &amp;quot;ambient&amp;quot; information about sense assignments. For instance, processing the definition spin 101: (of a spider or silkworm) to produce thread.., yields a semrel structure in which the sense node spinlO1 is linked by a DeepSubject relation to the undisambiguated form spider. On the subsequent pass, this information can be exploited by WSD in assigning sense 101 to the word spin in unrelated definitions: wolf_spider I00: any of various spiders...that...do not spin webs. This kind of bootstrapping reflects the broader nature of our approach, as discussed in the next section: a fully and accurately disambiguated MindNet allows us to bootstrap senses onto words encountered in free text outside the dictionary domain.</Paragraph>
    <Paragraph position="4"> 10 MindNet as a methodology The creation of MindNet was never intended to be an end unto itself. Instead, our emphasis has been on building a broad-coverage NLP understanding system.</Paragraph>
    <Paragraph position="5"> We consider the methodology for creating MindNet to consist of a set of general tools for acquiring, structuring, accessing, and exploiting semantic information from NL text.</Paragraph>
    <Paragraph position="6"> Our techniques for building MindNet are largely rule-based. However we arrive at these representations, though, the overall structure of MindNet can be regarded as crucially dependent on statistics. We have much more in common with traditional corpus-based approaches than a first glance might suggest. An advantage we have over these approaches, however, is the rich structure imposed by the parse, logical form, and word sense disambiguation components of our system. The statistics we use in the context of MindNet allow richer metrics because the data themselves are richer.</Paragraph>
    <Paragraph position="7"> Our first foray into the realm of processing free text with our methods has already been accomplished; Table 2 showed that some 58,000 example sentences from LDOCE and AHD3 were processed in the creation of our current MindNet. To put our hypothesis to a much more rigorous test, we have recently embarked on the assimilation of the entire text of the Microsoft Encarta(r) 98 Encyclopedia. While this has presented several new challenges in terms of volume alone, we have nevertheless successfully completed a first pass and have produced and added semrel structures from the Encarta(r) 98 text to MindNet. Statistics on that pass are given below:  Besides our venture into additional English data, we fully intend to apply the same methodologies to text in other languages as well. We are currently developing NLP systems for 3 European and 3 Asian languages: French, German, and Spanish; Chinese, Japanese, and Korean. The syntactic parsers for some of these languages are already quite advanced and have been demonstrated publicly. As the systems for these languages mature, we will create corresponding MindNets, beginning, as we did in English, with the processing of machine-readable reference materials and then adding information gleaned from corpora.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML