File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/j89-2002_intro.xml
Size: 6,877 bytes
Last Modified: 2025-10-06 14:04:48
<?xml version="1.0" standalone="yes"?> <Paper uid="J89-2002"> <Title>PARSING WITH A SMALL DICTIONARY FOR APPLICATIONS SUCH AS TEXT TO SPEECH</Title> <Section position="2" start_page="0" end_page="98" type="intro"> <SectionTitle> 1 INTRODUCTION </SectionTitle> <Paragraph position="0"> Parsing a sentence requires information about the parts of speech of its words. Previous work on natural language parsing has generally assumed that parts of speech are known for all words in an input text (Marcus 1980, Grishman 1986). For example, the EPISTLE system (Jensen 1983, Heidorn 1982) employs a 130,000word dictionary. Although a small dictionary of 200-300 words suffices for the function words (e.g., prepositions, pronouns), being able to identify nouns and verbs has required much larger dictionaries. Locating the verbs in a sentence is particularly useful to specifying prosody, because pauses often occur immediately before or after a verb group. The system described in this paper recognizes all function words and some content words, and uses syntactic constraints to estimate which words are likely to form phrases. It is compared to similar systems using dictionaries in excess of 2,000 words, which have been only partially described in the literature (Dewar 1969, Bachenko 1986). To the author's knowledge, these latter systems are the only other ones that have attempted parsing on arbitrary text with dictionaries of fewer than 10,000 words. Because the parser described here has access only to a very small dictionary, it cannot exploit many of the advances in parsing in recent years. What is explained below, how-Computational Linguistics, Volume 15, Number 2, June 1989 ever, is that accurate parsing need not require large dictionaries.</Paragraph> <Section position="1" start_page="0" end_page="98" type="sub_section"> <SectionTitle> 1.1 SYNTHESIS APPLICATIONS </SectionTitle> <Paragraph position="0"> The input for automatic speech synthesis systems can take several forms. In question-answer applications, a user may access a data base with information stored in non-textual form, e.g., tables or numbers. Such a system can use a very limited grammar in formulating the syntactic structure of the output speech (&quot;concept to speech&quot;: Young 1979). In some future systems, the queries may be in the form of speech, and automatic speech recognizers will extract prosody and syntax patterns, which can in turn be of assistance in synthesizing responses.</Paragraph> <Paragraph position="1"> A more immediate synthesis application is automatic text to speech synthesis (Klatt 1987). The conversion of arbitrary English text to speech is useful in aids for the blind and in general voice response systems. Visually handicapped people (few of whom know Braille) can have direct access to the vast wealth of printed information via an optical character reader and a text to speech synthesizer. Concerning voice response, much information in data bases is in the form of text; with an automatic text to speech system, people could telephone a remote data base and hear a vocal version of the information. The queries must be entered through 0362-613X/89/010097-108-$03.00 97 Douglas D. O'Shaughnessy Parsing with a Small Dictionary for Applications such as Text to Speech the telephone keypad or via speech of isolated words (where prosody and syntax plays no role), but the output speech can be in the form of sentences.</Paragraph> <Paragraph position="2"> In synthesis from a text of English sentences, the naturalness and intelligibility of the output speech is highly dependent upon realistic prosodic patterns (O'Shaughnessy 1983a). Current synthesizers have difficulty obtaining sufficient linguistic information from an input text to specify prosody properly. The syntactic structure of the text, in particular, is a major factor in determining where a speaker should pause, which words to stress, and how to use pitch rises and falls.</Paragraph> <Paragraph position="3"> However, the problem of parsing natural English, even using a large dictionary indicating parts of speech for all possible words, is as yet unsolved. English allows many syntactic constructions, which one recognizes when reading a text aloud. Text to speech systems, especially when pronouncing sentences with few punctuation marks, perform much more poorly than humans do. In some systems, the problem is further complicated because the number of entries in the dictionary must be minimized for economy. Such systems usually employ letter to phoneme rules, and a small dictionary to pronounce words for which the rules are inadequate.</Paragraph> <Paragraph position="4"> For certain words, knowledge of their syntactic role is imperative for proper pronunciation; e.g., refuse, wind, lives, separate use different sounds depending on whether they act as noun, verb, or adjective.</Paragraph> <Paragraph position="5"> Very little work on parsing sentences for speech synthesis purposes has been reported. This paper is the first to give parsing details specifically for synthesis while using a dictionary of fewer than 300 words. In most other references, the parsing problem is only mentioned in passing (Flanagan 1970; Coker 1973; Klatt 1987). The most documented system, MITalk-79 (Allen 1987), uses a large dictionary and treats parsing only on a local basis, ignoring important syntactic structures that encompass the entire sentence.</Paragraph> <Paragraph position="6"> Restricting the dictionary to a few hundred entries limits the ability of a parser to correctly analyze all texts. For text to speech, however, it is unnecessary to have a complete parse of the text to be spoken. The dictionary and pronunciation rules must be powerful enough to avoid mistakes in the translation of letters into phonemes, of course. But syntactic structure is useful mostly in specifying prosody, e.g., when to pause, which words to stress, and whether to raise or lower pitch at the end of a sentence. Syntactic information sufficient to specify prosody rarely requires a complete parse. Positions of major syntactic boundaries and identification of stressed words are of major concern. Confusions between nouns and adjectives, for instance, have little bearing on prosody. Using a flexible parser, moreover, minimizes the chance of meeting an unparsable text (Weischedel 1980). A parsing failure in synthesis systems is only serious if it results in an incorrect prosodic assignment that adversely affects the intelligibility or correct interpretation of the output speech. Whi\]le a local parsing error in one part of a sentence may lead to errors elsewhere in the sentence, many minor errors that occur in our parser due to use of a small dictionary have little effect on the important aspects of the global sentence parse.</Paragraph> </Section> </Section> class="xml-element"></Paper>