File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/n04-3005_abstr.xml

Size: 3,994 bytes

Last Modified: 2025-10-06 13:43:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-3005">
  <Title>Multilingual Video and Audio News Alerting</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper describes a fully-automated real-time broadcast news video and audio processing system. The system combines speech recognition, machine translation, and cross-lingual information retrieval components to enable real-time alerting from live English and Arabic news sources.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Real-time Video Alerting
</SectionTitle>
      <Paragraph position="0"> This paper describes the Enhanced Video Text and Audio Processing (eViTAP) system, which provides fully-automated real-time broadcast news video and audio processing. The system combines state-of-the-art automatic speech recognition and machine translation components with cross-lingual information retrieval in order to enable searching of multilingual video news sources by a monolingual speaker. In addition to full search capabilities, the system also enables real-time alerting, such that a user can be notified as soon as a word, phrase, or topic of interest appears in an English or Arabic news broadcast.</Paragraph>
      <Paragraph position="1"> The key component of the news processing is the Virage VideoLogger video indexer software package (Virage 2003). The VideoLogger processes an incoming live satellite feed, encodes the video as a digital file, and manages the video and audio processing components. The individual components integrated into the VideoLogger platform currently include the audio processing and machine translation systems described in Section 2, as well as face ID, broadcaster logo ID, and scene change analysis.</Paragraph>
      <Paragraph position="2"> The video and audio processing components produce textual metadata that is time-stamped to enable synchronization with the encoded video file. All data is indexed and stored for retrieval by a cross-lingual information retrieval engine. Figure 1 shows the EViTAP cross-lingual search and alerting interface, with real data from the system. The list of relevant video clips matching an alerting profile or a user search is shown on the left, with broadcast source and time, most-frequent named entities, and a relevancy score. Note that the English query &amp;quot;bin laden&amp;quot; resulted in the display of relevant stories in both English and Arabic. The center of the screen contains the video playback window, with clip navigation and keyframe storyboard. The right of the interface contains the transcripts from the ASR and MT engines; video playback is synchronized with the transcripts such that words are highlighted as they are spoken in the video.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Real-time Spoken Language Processing
</SectionTitle>
      <Paragraph position="0"> The real-time audio processing in the eViTAP system is performed by the BBN AudioIndexer system, described in detail in (Makhoul et al. 2000). The AudioIndexer system provides a wide range of real-time audio processing components, including automatic speech recognition, speaker segmentation and identification, topic classification, and named entity detection. All audio processing is carried out on a high-end PC (dual 2.6 GHz Xeon CPU, 2 GB RAM). The real-time speech recognition system produces a word error rate of roughly 20-30% for English and Arabic news sources.</Paragraph>
      <Paragraph position="1">  recognition output, Arabic-to-English machine translation output. The Arabic words produced by the speech recognition system, including all ASR errors, are processed by an Arabic-to-English machine translation system that also operates in real time (on a separate high-end PC). The eViTAP system currently processes</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML