XML Viewer - n06-2028

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-2028_intro.xml
Size: 3,921 bytes
Last Modified: 2025-10-06 14:03:29
<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-2028">
  <Title>Extracting Salient Keywords from Instructional Videos Using Joint Text, Audio and Visual Cues</Title>
  <Section position="2" start_page="0" end_page="109" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> With recent advances in multimedia technology, the number of videos that are available to both general public and particular individuals or organizations is growing rapidly.</Paragraph>
    <Paragraph position="1"> This consequently creates a high demand for efficient video searching and categorization as evidenced by the emergence of various offerings for web video searching. 1 While videos contain a rich source of audiovisual information, text-based video search is still among the most effective and widely used approaches. However, the quality of such text-based video search engines still lags behind the quality of those that search textual information like web pages. This is due to the extreme difficulty of tagging domain-specific keywords to videos. How to effectively extract domain-specific or salient keywords  from video transcripts has thus become a critical and challenging issue for both the video indexing and searching communities.</Paragraph>
    <Paragraph position="2"> Recently, with the advances in speech recognition and natural language processing technologies, systems are being developed to automatically extract keywords from video transcripts which are either transcribed from speech or obtained from closed captions. Most of these systems, however, simply treat all words equally or directly &amp;quot;transplant&amp;quot; keyword extraction techniques developed for pure text documents to the video domain without taking specific characteristics of videos into account (M. Smith and T. Kanade, 1997).</Paragraph>
    <Paragraph position="3"> In the traditional information retrieval (IR) field, most existing methods for selecting salient keywords rely primarily on word frequency or other statistical information obtained from a collection of documents (Salton and McGill, 1983; Salton and Buckley, 1988). These techniques, however, do not work well for videos for two reasons: 1) most video transcripts are very short, as compared to a typical text collection; and 2) it is impractical to assume that there is a large video collection on a specific topic, due to the video production costs. As a result, many keywords extracted from videos using traditional IR techniques are not really content-specific, and consequently, the video search results that are returned based on these keywords are generally unsatisfactory.</Paragraph>
    <Paragraph position="4"> In this paper, we propose a system for extracting salient or domain-specific keywords from instructional videos by exploiting joint audio, visual, and text cues. Specifically, we first apply a text-based keyword extraction system to find a set of keywords from video transcripts. Then we apply various audiovisual content analysis techniques to identify cue contexts in which domain-specific key-words are more likely to appear. Finally, we adjust the keyword salience by fusing the audio, visual and text cues together, and &amp;quot;discover&amp;quot; a set of salient keywords. Professionally produced educational or instructional  videos are the main focus of this work since they are playing increasingly important roles in people's daily lives. For the system evaluation, we used training and education videos that are freely downloadable from various DHS (Department of Homeland Security) web sites. These were selected because 1) DHS has an increasing need for quickly browsing, searching and re-purposing its learning resources across its over twenty diverse agencies; 2) most DHS videos contain closed captions in compliance with federal accessibility requirements such as Section 508.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML