XML Viewer - w03-0901

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-0901_metho.xml
Size: 8,487 bytes
Last Modified: 2025-10-06 14:08:28
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0901">
  <Title>A Knowledge-Driven Approach to Text Meaning Processing</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Text Interpretation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Extraction of Knowledge Fragments from Text
</SectionTitle>
      <Paragraph position="0"> Given the knowledge base of scenarios, our goal is to use it to interpret new text, by finding and instantiating the scenario in the KB which best matches the facts explicit in that text. To do this, first each sentence in the new text is parsed, and fragments are extracted from the parse tree. Parsing is done by SAPIR, a bottom-up chart parser used in Boeing (Holmback et al., 2000). Fragments are extracted by searching for subject-verb-object patterns in the parse tree, e.g., rooted at the main verb or in relative clauses. For example, given the sentence: (4) &amp;quot;A Russian Progress M-44 spaceship carrying equipment, food and fuel for the International Space Station was launched successfully Monday.&amp;quot; The fragments: (&amp;quot;&amp;quot; &amp;quot;launch&amp;quot; &amp;quot;spaceship&amp;quot;) (&amp;quot;spaceship&amp;quot; &amp;quot;carry&amp;quot; &amp;quot;equipment&amp;quot;) (&amp;quot;spaceship&amp;quot; &amp;quot;carry&amp;quot; &amp;quot;food&amp;quot;) (&amp;quot;spaceship&amp;quot; &amp;quot;carry&amp;quot; &amp;quot;fuel&amp;quot;) are extracted. Note that at this stage word sense disambiguation has not been performed.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Matching Scenarios with Fragments
</SectionTitle>
      <Paragraph position="0"> To match the scenario representations with the NLPprocessed text fragments, the system searches for matches between objects in the representations and objects mentioned in the fragments; and relationships in the representations and relationships mentioned in the fragments. The subject-verb-object fragments are first broken up into two, e.g., (&amp;quot;China&amp;quot; &amp;quot;launch&amp;quot; &amp;quot;satellite&amp;quot;) becomes (&amp;quot;launch&amp;quot; &amp;quot;subject&amp;quot; &amp;quot;China&amp;quot;) and (&amp;quot;launch&amp;quot; &amp;quot;object&amp;quot; &amp;quot;satellite&amp;quot;) before matching. Then the system searches for a scenario representation where as many as possible word-syntacticrelation-word fragments match conceptsemanticrelation-concept structures in the representation. Because we have used WordNet, each concept in the knowledge base has a set of associated words/phrases used to express it in English, and a word in a fragment &amp;quot;matches&amp;quot; a concept if that word is a member of these  nario representation that best matches the fragments extracted from the input text. Word sense and semantic role disambiguation is a side-effect, rather than a precursor to, this matching process.</Paragraph>
      <Paragraph position="1"> associated words (i.e., the synset) for that concept (or one of its specializations or generalizations). This is illustrated in Figure 3. A simple scoring function is used to assess the degree of match, looking for the scenario with the maximum number of matching fragments, and in the case of a tie preferring the scenario with the maximum number of objects potentially matching some item in the text.</Paragraph>
      <Paragraph position="2"> Note that it is only at this point that word sense and semantic relation disambiguation are performed. For example, in this case the fragments extracted from text best match the launch a satellite v1 scenario; as a result, &amp;quot;launch&amp;quot; in the text will be taken to mean the launch a satellite v1concept (a2 word sense), as opposed to launching a product, launching a ship, etc.</Paragraph>
      <Paragraph position="3"> One piece of information we are not currently exploiting in this matching process are the statistical probabilities that particular syntactic roles (grammatical functions) such as subject, direct object, etc., will correspond to particular semantic roles such asagent n1,vehicle n1, etc. These would help the matcher deal with ambiguous cases, where the current approach is not sufficient to determine the appropriate match. Automated methods for obtaining such statistics, such as (Gildea and Jurafsky, 2002), could be exploited for this task.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Question Answering
</SectionTitle>
      <Paragraph position="0"> Having identified and instantiated the appropriate scenario representation in the knowledge base, that representation is now available for use in question-answering.</Paragraph>
      <Paragraph position="1"> This allows questions to be answered which go beyond facts explicitly mentioned in the text, but which are part of the scenario representation (e.g., a question about the rocket), and those requiring inference (using KM's inference engine, applied to the scenario and other knowledge in the knowledge base).</Paragraph>
      <Paragraph position="2"> The inference engine currently requires questions to be posed in the native representation language (KM), rather than having a natural language front end. Given a query, KM will not just retrieve facts contained explicitly in the instantiated scenario representation, but also compute additional facts using standard reasoning mechanisms of inheritance and rule evaluation. For example, launch a satellite v1 is a subclass of transport v1, whose representation includes an axiom stating that during the move v1 subevent, the cargo is inside the vehicle. Given an appropriate query, this axiom will be inherited to launch a satellite v1, allowing the system to conclude that during the move subevent of the satellite launch - herefly v1- the satellite (cargo) will be inside the rocket (vehicle). The ability of the system to reach this kind of conclusion demonstrates, to a certain degree, that it has acquired at least some of the &amp;quot;deep&amp;quot; meaning of the text, as these conclusions go beyond the information contained in the original text itself.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Semi-Automatic Construction of the KB
</SectionTitle>
    <Paragraph position="0"> For a broad coverage system, a large number of scenario representations will be necessary, more than can be feasibly built by hand. While fully automatic acquisition of these representations from text seems beyond the state of the art, we believe there is a middle ground in which the &amp;quot;raw material&amp;quot; for these representations can be extracted automatically from text, and which can then be rapidly filtered and assembled by a person.</Paragraph>
    <Paragraph position="1"> As an initial exploration in this direction, we applied our &amp;quot;fragment extractor&amp;quot; to part of the Reuters corpus (Reuters, 2003) to obtain a database of 1.1 million subject-verb-object fragments. From this database, high-frequency patterns can then be searched for, providing possible material for incorporating into new scenario representations. For example, the database reveals (by looking at the various tuple frequencies) that satellites are most commonly built, launched, carried, and used; rockets most commonly carry satellites; Russia and rockets most commonly launch satellites; and that satellites most commonly transmit and broadcast. Similarly for the verb &amp;quot;launch&amp;quot;, things which are most commonly launched (according to the database) are campaigns, services, funds, investigations, attacks, bonds, and satellites, suggesting a set of scenario representations which could then be built by searching further from these terms. Although these fragments are not yet assembled into larger scenario representations and word senses have not been disambiguated, further work in this direction may yield methods by which a user can rapidly find and assemble candidate elements of representations into larger structures, perhaps guided by the existing abstract models already in the knowledge base. Other corpus-based techniques such as (Lin and Pantel, 2001) could also be used to provide additional raw material for scenario construction.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML