File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/h01-1051_concl.xml

Size: 3,049 bytes

Last Modified: 2025-10-06 13:53:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1051">
  <Title>The Meeting Project at ICSI</Title>
  <Section position="6" start_page="4" end_page="4" type="concl">
    <SectionTitle>
5. FUTURE WORK
</SectionTitle>
    <Paragraph position="0"> The areas mentioned in the earlier section on &amp;quot;Challenges&amp;quot; will require much more work in the future. We and our colleagues at collaborating institutions will be working in all of these. Here, we briefly mention some of the work in our current plans for the study of speech from meetings.</Paragraph>
    <Paragraph position="1"> Far-field microphone ASR. Starting with the read digits and proceeding to spontaneous speech, we will have a major focus on improving recognition on the far-field channels. In earlier work we have had some success in recognizing artificially degraded speech [6][5], and will be adapting and more fully developing these approaches for the new data and task. Our current focus in these methods is on the designing of multiple acoustic representations and the combination of the resulting probability streams, but we will also compare these to methods that are more standard (but impractical for the general case) such as echo cancellation using both the close and distant microphones.</Paragraph>
    <Paragraph position="2"> Overlap type modeling. One of the distinctive characteristics of naturalistic conversation (in contrast to monolog situations) is the presence of overlapping speech. Overlapping speech may be of several types, and affects the flow of discourse in various ways. An overlap may help to usurp the floor from another speaker (e.g., interruptions), or to encourage a speaker to continue (e.g., back channels). Also, some overlaps may be accidental, or a part of joint action (as when a group tries to help a speaker to recall a person's name when he is in mid-sentence). In addition, different speakers may differ in the amount and kinds of overlap in which they engage (speaker style). In future work we will explore types of overlaps and their physical parameters, including prosodic aspects.</Paragraph>
    <Paragraph position="3"> Language modeling. Meetings are also especially challenging for the language model, since they tend to comprise a diverse range of topics and styles, and matched training data is hard to come by (at least in this initial phase of the project). Therefore, we expect meeting recognition to necessitate investigation into novel language model adaptation and robustness techniques.</Paragraph>
    <Paragraph position="4"> Prosodic modeling. Finally, we plan to study the potential contribution of prosodic (temporal and intonational) features to automatic processing of meeting data. A project just underway is constructing a database of prosodic features for meeting data, extending earlier work [10, 9]. Goals include using prosody combined with language model information to help segment speech into coherent semantic units, to classify dialog acts [12], and to aid speaker segmentation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML