File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/01/j01-3002_abstr.xml
Size: 3,850 bytes
Last Modified: 2025-10-06 13:41:59
<?xml version="1.0" standalone="yes"?> <Paper uid="J01-3002"> <Title>A Statistical Model for Word Discovery in Transcribed Speech</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> English speech lacks the acoustic analog of blank spaces that people are accustomed to seeing between words in written text. Discovering words in continuous spoken speech is thus an interesting problem and one that has been treated at length in the literature. The problem of identifying word boundaries is particularly significant in the parsing of written text in languages that do not explicitly include spaces between words. In addition, if we assume that children start out with little or no knowledge of the inventory of words the language possesses identification of word boundaries is a significant problem in the domain of child language acquisition. 1 Although speech lacks explicit demarcation of word boundaries, it is undoubtedly the case that it nevertheless possesses significant other cues for word discovery. However, it is still a matter of interest to see exactly how much can be achieved without the incorporation of these other cues; that is, we are interested in the performance of a bare-bones language model.</Paragraph> <Paragraph position="1"> For example, there is much evidence that stress patterns (Jusczyk, Cutler, and Redanz 1993; Cutler and Carter 1987) and phonotactics of speech (Mattys and Jusczyk 1999) are of considerable aid in word discovery. But a bare-bones statistical model is still useful in that it allows us to quantify precise improvements in performance upon the integration of each specific cue into the model. We present and evaluate one such statistical model in this paper. 2 The main contributions of this study are as follows: First, it demonstrates the applicability and competitiveness of a conservative, traditional approach for a task for which nontraditional approaches have been proposed even recently (Brent 1999; Brent and Cartwright 1996; de Marcken 1995; Elman 1990; Christiansen, Allen, and Seidenberg 1998). Second, although the model leads to the development of an algorithm that learns the lexicon in an unsupervised fashion, results of partial supervision are presented, showing that its performance is consistent with results from learning theory.</Paragraph> <Paragraph position="2"> Third, the study extends previous work to higher-order n-grams, specifically up to upon request from the author. The programs (totaling about 900 lines) have been written in C++ to compile under Unix/Linux. The author will assist in porting it to other architectures or to versions of Unix other than Linux or SunOS/Solaris if required.</Paragraph> <Paragraph position="3"> (~) 2001 Association for Computational Linguistics Computational Linguistics Volume 27, Number 3 trigrams, and discusses the results in their light. Finally, results of experiments suggested in Brent (1999) regarding different ways of estimating phoneme probabilities are also reported. Wherever possible, results are averaged over 1000 repetitions of the experiments, thus removing any potential advantages the algorithm may have had due to ordering idiosyncrasies within the input corpus.</Paragraph> <Paragraph position="4"> Section 2 briefly discusses related literature in the field and recent work on the same topic. The model is described in Section 3. Section 4 describes an unsupervised learning algorithm based directly on the model developed in Section 3. This section also describes the data corpus used to test the algorithms and the methods used.</Paragraph> <Paragraph position="5"> Results are presented and discussed in Section 5. Finally, the findings in this work are summarized in Section 6.</Paragraph> </Section> class="xml-element"></Paper>