File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2155_metho.xml
Size: 10,722 bytes
Last Modified: 2025-10-06 14:15:03
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2155"> <Title>Constituent-based Accent Prediction</Title> <Section position="4" start_page="939" end_page="939" type="metho"> <SectionTitle> 3 Constituent-based experiments </SectionTitle> <Paragraph position="0"> To test the generality of the proposed account of accent and attention, the ability of local and global focusing features to predict accent for a blind corpus is examined using machine learning. To rigorously assess the potential gains to be had from these attentional features, we consider them in combination with lexical and syntactic features identified in the literature as strong predictors of accentuation (AItenberg, 1987; Hirschberg, 1993; Ross et al., 1992).</Paragraph> <Paragraph position="1"> The narrative was collected by Virginia Merlini.</Paragraph> <Paragraph position="2"> ~Accented expressions are identified by the presence of</Paragraph> </Section> <Section position="5" start_page="939" end_page="942" type="metho"> <SectionTitle> PITCH ACCENT (Pierrehumbert, 1980). SUBJECT PRONOUNS (N=I 11) </SectionTitle> <Paragraph position="0"> focusing functions of accent appear in italics.</Paragraph> <Paragraph position="1"> Previous studies, nonetheless, were aimed at predicting word accentuation, and so the features we borrow are being tested for the first time in learning the abstract accentuation patterns of syntactic constituents, specifically noun phrases (NPs).</Paragraph> <Section position="1" start_page="939" end_page="940" type="sub_section"> <SectionTitle> 3.1 Methods </SectionTitle> <Paragraph position="0"> Accent prediction models are learned from a corpus of unrestricted, spontaneous direction-giving monologues from the Boston Directions Corpus (Nakatani et al., 1995). Eighteen spontaneous direction-giving monologues are analyzed from two American English speakers, H1 (male) and H3 (female). The monologues range from 43 to 631 words in length, and comprise 1031 referring expressions made up of 2020 words. Minimal, non-recursive Accent class TTS-assigned accenting Actual accenting citation a LITTLE SHOPPING AREA a LITTLE SHOPPING AREA we we supra reduced one a PRETTY nice AMBIANCE the GREEN LINE SUBWAY YET ANOTHER RIGHT TURN ONE a PRETTY NICE AMBIANCE the GREEN Line SUBWAY yet ANOTHER RIGHT TURN shift a VERY FAST FIVE MINUTE lunch a VERY FAST FIVE minute LUNCH NP constituents, referred to as BASENPs, are automatically identified using Collins' (1996) lexical dependency parser. In the following complex NP, baseNPs appear in square brackets: \[the brownstone apartment building\] on \[the corner\] of\[Beacon and Mass Ave\]. BaseNPs are semi-automatically labeled for lexical, syntactic, local focus and global focus features. Table 2 provides summary corpus statistics. A rule-based machine learning program, Ripper (Cohen, 1995), is used to acquire accent classification systems from a training corpus of correctly classified examples, each defined by a vector of feature values, or predictors. 3</Paragraph> </Section> <Section position="2" start_page="940" end_page="941" type="sub_section"> <SectionTitle> 3.2 Citation-based Accent Classification </SectionTitle> <Paragraph position="0"> The accentuation of baseNPs is coded according to the relationship of the actual accenting (i.e. accented versus unaccented) on the words in the baseNP to the accenting predicted by a TTS system that received each sentence in the corpus in isolation. The actual accenting is determined by prosodic labeling using the ToBI standard (Pitrelli et al., 1994). Word accent predictions are produced by the Bell Laboratories NewTTS system (Sproat, 1997).</Paragraph> <Paragraph position="1"> NewTTS incorporates complex nominal accenting rules (Sproat, 1994) as well as general, word-based accenting rules (Hirschberg, 1993). It is assumed ZRipper is similar to CART (Breiman et al., 1984), but it directly produces IF-THEN logic rules instead of decision trees and also utilizes incremental error reduction techniques in combination with novel rule optimization strategies.</Paragraph> <Paragraph position="2"> for the purposes of this study that NewTTS generally assigns citation-style accentuation when passed sentences in isolation.</Paragraph> <Paragraph position="3"> For each baseNP, one of the following four accenting patterns is assigned: * CITATION FORM: exact match between actual and &quot;ITS-assigned word accenting.</Paragraph> <Paragraph position="4"> * SUPRA: one or more accented words are predicted unaccented by TFS; otherwise, &quot;ITS predictions match actual accenting.</Paragraph> <Paragraph position="5"> * REDUCED: one or more unaccented words are predicted accented by TTS; otherwise, &quot;FrS predictions match actual accenting.</Paragraph> <Paragraph position="6"> * SHIFT: at least one accented word is predicted unaccented by &quot;ITS, and at least one unaccented word is predicted accented by &quot;ITS.</Paragraph> <Paragraph position="7"> Examples from the Boston Directions Corpus for each accent class appear in Table 3.</Paragraph> <Paragraph position="8"> Table 4 gives the breakdown of coded baseNPs by accent class. In contrast to read-aloud citation-style speech, in these unrestricted, spontaneous monologues, 30% of referring expressions do not bear citation form accentuation. The citation form accent percentages serve as the baseline for the accent prediction experiments; correct classification rates above 75.8% and 60.2% for H1 and H3 respectively would represent performance above and beyond the state-of-the-art citation form accentuation models, gained by direct modeling of cases of supra, reduced or shifted constituent-based accentuation.</Paragraph> </Section> <Section position="3" start_page="941" end_page="942" type="sub_section"> <SectionTitle> 3.3 Predictors </SectionTitle> <Paragraph position="0"> The use of set features, which are handled by Ripper, extends lexical word features to the constituent level. Two set-valued features, BROAD CLASS SEQUENCE and LEMMA SEQUENCE, represent lexical information. These features consist of an ordered list of the broad class part-of-speech (POS) tags or word lemmas for the words making up the baseNP.</Paragraph> <Paragraph position="1"> For example, the lemma sequence for the NP, the Harvard Square T stop, is {the, Harvard, Square, T, stop}. The corresponding broad class sequence is {determiner, noun, noun, noun, noun}. Broad class tags are derived using Brill's (1995) part-of-speech tagger, and word lemma information is produced by NewTTS (Sproat, 1997).</Paragraph> <Paragraph position="2"> POS information is used to assign accenting in nearly all speech synthesis systems. Initial word-based experiments on our corpus showed that broad class categories performed slightly better than both the function-content distinction and the POS tags themselves, giving 69%-81% correct word predictions (Nakatani, 1997).</Paragraph> <Paragraph position="3"> The CLAUSE TYPE feature represents global syntactic constituency information, while the BASENP TYPE feature represents local or NP-internal syntactic constituency information. Four clause types are coded: matrix, subordinate, predicate complement and relative. Each baseNP is semi-automatically assigned the clause type of the lowest level clause or nearest dominating clausal node in the parse tree, which contains the baseNP. As for baseNP types, the baseNP type of baseNPs not dominated by any NP node is SIMPLE-BASENP. BaseNPs that occur in complex NPs (and are thus dominated by at least one NP node) are labeled according to whether the baseNP contains the head word for the dominating NP. Those that are dominated by only one NP node and contain the head word for the dominating NP are HEAD-BASENPS; all other NPs in a complex NP are CHILD-BASENPS. Conjoined noun phrases involve additional categories of baseNPs that are collapsed into the CONJUNCT-BASENP category. Table 5 gives the distributions of baseNP types.</Paragraph> <Paragraph position="4"> Focus projection theories of accent, e.g. (Gussenhoven, 1984; Selkirk, 1984), would predict a large baseNPs.</Paragraph> <Paragraph position="5"> role for syntactic constituency information in determining accent, especially for noun phrase constituents. Empirical evidence for such a role, however, has been weak (Altenberg, 1987).</Paragraph> <Paragraph position="6"> The local attentional status of baseNPs is represented by two features commonly used in centering theory to compute the Cb and the Cf list, GRAMMATICAL FUNCTION and FORM OF EXPRESSION (Grosz et al., 1995). Hand-labeled grammatical functions include sttbject, direct object, indirect object, predicate complement, adfimct. Form of expression feature values are .adverbial noun, cardinal, definite NP, demonstrative NP, indefinite NP, pronoun, proper name, quantifier NP, verbal noun, etc.</Paragraph> <Paragraph position="7"> The global focusing status of baseNPs is computed using two sets of analyses: discourse segmentations and coreference coding. Expert discourse structure analyses are used to derive CONSENSUS SEGMENTATIONS, consisting of discourse boundaries whose coding all three labelers agreed upon (Hirschberg and Nakatani, 1996). The consensus labels for segment-initial boundaries provide a linear segmentation of a discourse into discourse segments. Coreferential relations are coded by two labelers using DTT (Discourse Tagging Tool) (Aone and Bennett, 1995). To compute coreference chains, only the relation of strict coference is used. Two NPs, npl and np2, are in a strict coreference relationship, when np2 occurs after npl in the discourse and realizes the same discourse entity that is realized by npl. Reference chains are then automatically computed by linking noun phrases in strict coference relations into the longest possible chains. Given a consensus linear segmentation and reference chains, global focusing status is determined. For each baseNP, if it does not occur in a reference chain, and thus is realized only once in the dis- null course, it is assigned the SINGLE-MENTION focusing status. The remaining statuses apply to baseNPs that do occur in reference chains. If a baseNP in a chain is not previously mentioned in the discourse, it is assigned the FIRST-MENTION status. If its most recent coreferring expression occurs in the current segment, the baseNP is in IMMEDIATE fOCUS; if it occurs in the immediately previous segment, the baseNP is in NEIGHBORING fOCUS; if it occurs in the discourse but not in either the current or immediately previous segments, then the baseNP is assigned STACK focus.</Paragraph> </Section> </Section> class="xml-element"></Paper>