XML Viewer - w97-0509

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/w97-0509_metho.xml
Size: 10,860 bytes
Last Modified: 2025-10-06 14:14:45
<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0509">
  <Title>I Application of NLP technology to production of closed-caption TV programs in Japanese for the hearing impaired Takahiro Wakao Terumasa Ehara Telecommunications NHK Science and Advancement Technical Organization (TAO) Research Labs. of Japan / TAO Eiji Sawamura TAO Yoshiharu Abe Mitsubishi Electric Corp Information Technology R&amp;D Center / TAO</Title>
  <Section position="2" start_page="0" end_page="55" type="metho">
    <SectionTitle>
2 Research Issues
</SectionTitle>
    <Paragraph position="0"> Main research issues in the project are as follows: * automatic text summarization * automatic synchronization of text and speech * building an efficient closed caption production system We would like to have the following system (Figure 1) based on the research on the above issues. Although all types of TV programs are to be handled in the project, the first priority is given to TV news programs.</Paragraph>
    <Paragraph position="1"> The outline of each research issue is described next.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Automatic Text Summarization
</SectionTitle>
      <Paragraph position="0"> For most of the TV news programs today, scripts (written text) are available before they are read out by newscasters. The Japanese news text is read at the speed of four hundred characters per minute and it is too fast, and there are too many characters when all the characters of what is said are shown on the screen (Komine et al., 1996). Thus we need to summarize the news program text and then show it on TV screen. The aim of the research on automatic text summarization is to summarize the text fully or partially automatically to a proper size in order to assist the closed caption production.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Automatic Synchronization of Text and
Speech
</SectionTitle>
      <Paragraph position="0"> Once the original news program text is summarized, it should be synchronized with the actual sound, or the speech of the programs. At present this is done by hand when the closed captions are produced. We would like to make use of speech recognition technology to help the task of synchronizing text with speech. Please note that what we aim at is to synchronize the original te'xt rather than the summarized text with the speech.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="55" type="sub_section">
      <SectionTitle>
2.3 Efficient Closed Caption Production
System
</SectionTitle>
      <Paragraph position="0"> We will create a system by integrating the summarization and synchronization techniques with techniques for superimposing characters. We also need to research on other aspects such as what the best way is to show the characters on the screen for the handicapped viewers.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="55" end_page="55" type="metho">
    <SectionTitle>
3 Project Schedule
</SectionTitle>
    <Paragraph position="0"> The project is divided into two stages; the first three years and the rest, two years. We conduct research on the above issues and create a prototype system in the first stage. In addition, the prototype system is to be used to produce closed captions, and the capability and functions of the system will be evaluated.</Paragraph>
    <Paragraph position="1"> We will improve the prototype system in the second stage.</Paragraph>
    <Paragraph position="2"> In 1996 and 1997, the following research has been conducted and will be continued.</Paragraph>
    <Paragraph position="3">  * Automatic text summarization - method for dividing a sentence into smaller sections - key word extraction - method for connecting sentence sections * Automatic synchronization of text and speech -transcription, speech model integration system - maximum likelihood matching system - speech database * E~cient closed caption production system  -integrated simulation system for closed caption production</Paragraph>
  </Section>
  <Section position="4" start_page="55" end_page="57" type="metho">
    <SectionTitle>
4 Preliminary Research Results
</SectionTitle>
    <Paragraph position="0"> We have conducted preliminary research for automatic text summarization and synchronization of text and speech, and the results are as follows.</Paragraph>
    <Section position="1" start_page="55" end_page="56" type="sub_section">
      <SectionTitle>
4.1 Automatic Text Summarization
</SectionTitle>
      <Paragraph position="0"> Text summarization research in the past may he grouped into three approaches. The first is to generate summarized sentences based on understanding of the text. It is desirable, however, it is not a practical method at present in order to summarize actual TV news program text.</Paragraph>
      <Paragraph position="1"> The second is to digest the text by making use of text structures such as paragraphs. It has been applied to newspaper articles in Japanese (Yamamoto et al, 1994). In this approach important parts of the text which are to be kept in the summarization, are determined by their locations, i.e. where they appear in the text. For example, if nouns or proper nouns appear in the headline, they are considered as 'important' and may be used as measures of finding out how important the other parts of the text are. As we describe later, TV news text is different from newspaper articles in that it does not have obvious structures, i.e. the TV news text has fewer sentences and usually only one paragraph without titles or headlines. Thus the second approach is not suitable for the TV news text.</Paragraph>
      <Paragraph position="2"> The third is to detect important (or relevant) words (segments in the case of Japanese), and determine which section of the text is important, and then put them together to have 'summarization' of the text. This is probably most robust among the three approaches and we are using the third approach currently (for summary of various summarization techniques, please see (Paice, 1990)).</Paragraph>
      <Paragraph position="3"> To illustrate the difference between TV news program text and newspaper articles, we compared one  thousand randomly selected articles from both domains. The results are shown in Fig 2 and Fig 3.</Paragraph>
      <Paragraph position="4">  newspaper text, the TV news program text has the following features:  If we summarize TV news program text by selecting 'sentences' from the text, it will be 'rough' summarization. On the other hand, if we can divide long sentences into smaller sections and thus increase the number of 'sentences (sections)' in the text, then we may have better summarization (Kim and Ehara, 1994): As a method of summarization, we are using the third approach. To find important words in the text, high-frequency key word method and TF-IDF (Term Frequency - Inverse Document Frequency) method have been adopted, and the two methods are evaluated automatically on a large-scale in our preliminary research. We used ten thousand (10000) TV news texts between 1992 and 1995 (2500 texts each year) for the evaluation. One of the features of the TV news texts is that the first sentence is the most important. We conducted the evaluation by taking advantage of the feature.</Paragraph>
      <Paragraph position="5"> Key words used in the high-frequency key word method are content words which appear more than twice in a given text (Luhn, 1957),(Edmundson, 1969). To determine the importance of a sentence, we counted the number of the key words in the sentence and then it is divided by the number of the words (including function and content words). In the TF-IDF method, first the weight of each word is computed by multiplying its frequency in the text (TF) and its IDF in a given text collection. The importance of the sentence is thus computed by summing up all the weights of the words in the sentence and divided by the number of the words (Spark Jones, 1972), (Salton, 1971).</Paragraph>
      <Paragraph position="6"> The evaluation details are as follows. First, the importance of each sentence is calculated by the high-frequency key word or TF-IDF method. Then the sentence are ranked according to their importance. We computed the accuracy of the method by looking at whether the first sentence is ranked the first, or ranked either the first or the second.</Paragraph>
      <Paragraph position="7"> The evaluation results are shown in Table 1. The</Paragraph>
    </Section>
    <Section position="2" start_page="56" end_page="57" type="sub_section">
      <SectionTitle>
4.2 Automatic Synchronization of Text and
Speech
</SectionTitle>
      <Paragraph position="0"> As the next step, we need to synchronize the text and the speech. First, the written TV news text is changed into the stream of phonetic transcriptions, and then synchronization is done by detecting the time points of the text sections and their corresponding speech sections. At the same time, we havestarted to create news speech database. In 1996, we collected the speech data by simulating news programs, i.e. the TV news texts were read and recorded in a studio rather than actual TV news programs on the air were recorded. We collected seven and half hours of recordings of twenty people  (both male and female). We plan to record actual programs as 'real' data in addition to the simulation recording in 1997. The real data will be taken from both radio and TV news programs.</Paragraph>
      <Paragraph position="1"> Preliminary research on detection of synchronization points is conducted by using the data we have created. A speech model is produced by using three hours (four male and four female persons) of recording as training data. For each speaker, a twoloop, four-mixture-distribution phonetic HMM was learned. Based on the HMMs, key-word pair models were obtained from the phonetic transcription. The key-word pair model is shown in Fig 4. The model consists of two strings of words (keywordsl and keywords2) before and after the synchronization point  When the speech is fed to the model, the nonsynchronizing input data travel through the garbage arc while the synchronizing data go through the keywords. It means that the likelihood at point B increases. Thus if we observe the likelihood at point B and it goes over a certain threshold, we decide it is the synchronization point for the input data.</Paragraph>
      <Paragraph position="2"> Twenty-one key-word pairs were taken from the data which was not used in the training, and selected for evaluation. We fed one male and one female speech to the model in the evaluation. The result is shown in Table 2.</Paragraph>
      <Paragraph position="3"> As we decrease the threshold, the detection rate increase, however, the false alarm rate increases rapidly.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML