File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/n04-4034_concl.xml

Size: 2,308 bytes

Last Modified: 2025-10-06 13:54:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4034">
  <Title>Multi-Speaker Language Modeling</Title>
  <Section position="5" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Discussions and Conclusion
</SectionTitle>
    <Paragraph position="0"> In this paper, novel multi-speaker language modeling (MSLM) is introduced and evaluated. After simply adding words from other speakers into a normal trigram context, the new model shows a reasonable improvement in perplexity. This model can be further improved when class-based cross-speaker information is employed. We also presented two different criteria for this clustering.</Paragraph>
    <Paragraph position="1"> The more complex criteria gives similar results to the simple one, presumably due to data sparseness. Even though Switchboard and meeting data are different in terms of topic, speaking style, and speaker number, one might more robustly learn cross-speaker information by training on the union of these two data sets.</Paragraph>
    <Paragraph position="2"> There are a number of ways to extend this work. First, our current approach is purely data driven. One can imagine that higher level information (e.g., a dialog or other speech act) about the other speakers might be particularly important. Latent semantic analysis of stream A might also be usefully employed here. Furthermore, more than one word from stream A can be included in the context to provide additional predictive ability. With the meeting data, there may be a bene t to controlling for speci c speakers based on their degree of in uence. Alternatively, an MSLM might help identify the most in uential speaker in a meeting by determining who most changes the probability of other speakers' words.</Paragraph>
    <Paragraph position="3"> Moreover, the approach clearly suggests that a multi-speaker decoder in an automatic speech recognition (ASR) system might be bene cial. Once time marks for each word are provided in an N-best list, our MSLM technique can be used for rescoring. Additionally, such a decoder can easily be speci ed using graphical models (Bilmes and Zweig, 2002) in rst-pass decodings.</Paragraph>
    <Paragraph position="4"> We wish to thank Katrin Kirchhoff and the anonymous reviewers for useful comments on this work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML