File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/n04-1017_relat.xml
Size: 3,777 bytes
Last Modified: 2025-10-06 14:15:43
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-1017"> <Title>Lattice-Based Search for Spoken Utterance Retrieval</Title> <Section position="3" start_page="0" end_page="0" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> There are commercial systems including Nexidia/Fast-</Paragraph> <Paragraph position="2"> as well as research systems like AT&T DVL (Cox et al., 1998), AT&T ScanMail (Hirschberg et al., 2001), BBN Rough'n'Ready (Makhoul et al., 2000), CMU Informedia (www.informedia.cs.cmu.edu), SpeechBot (www.speechbot.com), among others.</Paragraph> <Paragraph position="3"> Also between 1997 and 2000, the Test REtrieval Conference (TREC) had a spoken document retrieval (SDR) track with many participants (Garofolo et al., 2000).</Paragraph> <Paragraph position="4"> NIST TREC-9 SDR Web Site (2000) states that: The results of the TREC-9 2000 SDR evaluation presented at TREC on November 14, 2000 showed that retrieval performance for sites on their own recognizer transcripts was virtually the same as their performance on the human reference transcripts. Therefore, retrieval of excerpts from broadcast news using automatic speech recognition for transcription was deemed to be a solved problem - even with word error rates of 30%.</Paragraph> <Paragraph position="5"> PhD Theses written on this topic include James (1995), Wechsler (1998), Siegler (1999) and Ng (2000).</Paragraph> <Paragraph position="6"> Jones et al. (1996) describe a system that combines a large vocabulary continuous speech recognition (LVCSR) system and a phone-lattice word spotter (WS) for retrieval of voice and video mail messages (Brown et al., 1996). Witbrock and Hauptmann (1997) present a system where a phonetic transcript is obtained from the word transcript and retrieval is performed using both word and phone indices. Wechsler et al. (1998) present new techniques including a new method to detect occurrences of query features, a new method to estimate occurrence probabilities, a collection-wide probability re-estimation technique and feature length weighting. Srinivasan and Petkovic (2000) introduce a method for phonetic retrieval based on the probabilistic formulation of term weighting using phone confusion data. Amir et al. (2001) use indexing based on confusable phone groups and a Bayesian phonetic edit distance for phonetic speech retrieval. Logan et al. (2002) compare three indexing methods based on words, syllable-like particles, and phonemes to study the problem of OOV queries in audio indexing systems.</Paragraph> <Paragraph position="7"> Logan and Van Thong (2002) give an alternate approach to the OOV query problem by expanding query words into in-vocabulary phrases while taking acoustic confusability and language model scores into account.</Paragraph> <Paragraph position="8"> Of the previous work, the most similar approach to the one proposed here is that of Jones et al. (1996), in that they used phone lattices to aid in word spotting, in addition to single-best output from LVCSR. Our proposal might be thought of as a generalization of their approach in that we use lattices as the sole representation over which retrieval is performed. We believe that lattices are a more natural representation for retrieval in cases where there is a high degree of uncertainty about what was said, which is typically the case in LVCSR systems for conversational speech. We feel that our results, presented below, bear out this belief. Also novel in our approach is the use of indexed lattices allowing for efficient retrieval. As we note below, in the limit where one is using one-best output, the indexed lattices reduce to the normal inverted index used in text retrieval.</Paragraph> </Section> class="xml-element"></Paper>