XML Viewer - w04-3012

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3012_intro.xml
Size: 3,969 bytes
Last Modified: 2025-10-06 14:02:51
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3012">
  <Title>Word level confidence measurement using semantic features. In Proceedings of ICASSP, Hong Kong, April.</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 High-Level Knowledge Sources
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Ontology and lexicon
</SectionTitle>
      <Paragraph position="0"> Current SLP systems often employ multi-domain ontologies representing the relevant world and discourse knowledge. The knowledge encoded in such an ontology can be applied to a variety of natural language processing tasks, e.g. Mahesh and Nirenburg (1995), Flycht-Eriksson (2003).</Paragraph>
      <Paragraph position="1"> Our ontology models the domains Electronic Program Guide, Interaction Management, Cinema Information, Personal Assistance, Route Planning, Sights, Home Appliances Control and Off Talk.</Paragraph>
      <Paragraph position="2"> The hierarchically structured ontology consists of ca. 720 concepts and 230 properties specifying relations between concepts. For example every instance of the concept Process features the relations hasBeginTime, hasEndTime and hasState.</Paragraph>
      <Paragraph position="3"> A detailed description of the ontology employed in our experiments is given in Gurevych et al. (2003b). Ontological concepts are high-level units. They allow to reduce the amount of information needed to represent relations existing between individual lexemes and to effectively incorporate this knowledge into automatic language processing. E.g., there may exist a large number of movies in a cinema reservation system. All of them will be represented by the concept Movie, thus allowing to map a variety of lexical items (instances) to a single unit (concept) describing their meaning and the relations to other concepts in a generic way.</Paragraph>
      <Paragraph position="4"> We did not use the structure of the ontology in an explicit way in the reported experiments. The knowledge was used implicitly to come up with a set of ontological concepts needed to represent the user's utterance.</Paragraph>
      <Paragraph position="5"> The high-level domain knowledge represented in the ontology is linked with the language-specific knowledge through a lexicon. The lexicon contains ca. 3600 entries of lexical items and their senses (0 or more), encoded as concepts in the ontology. E.g., the word am is mapped to the ontological concepts StaticSpatialProcess as in the utterance I am in New York, SelfIdentificationProcess as in the utterance I am Peter Smith, and NONE, if the lexeme has a grammatical function only, e.g., I am going to read a book.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Domain models
</SectionTitle>
      <Paragraph position="0"> For scoring high-level linguistic representations of utterances we use a domain model. A domain model is a two-dimensional matrix DM with the dimensions (#d #c), where #d and #c denote the overall number of domain categories and ontological concepts, respectively. This can be formalized as: DM = (Sdc)d=1;:::;#d;c=1;:::;#c, where the matrix elements Sdc are domain specificity scores of individual concepts.</Paragraph>
      <Paragraph position="1"> We experimented with two different domain models. The first model DManno was obtained through direct annotation of concepts with respect to domains as reported in Section 3.2. The second domain model DMtf idf resulted from statistical analysis of Dataset 1 (described in Section 3.1). In this case, we computed the term frequency - inverse document frequency (tf*idf) score (Salton and Buckley, 1988) of each concept for individual domains. In the case of human annotations, we deal with binary values, whereas tf*idf scores range over the interval [0,1].</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML