File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1014_metho.xml
Size: 17,971 bytes
Last Modified: 2025-10-06 14:14:06
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-1014"> <Title>Integrating Syntactic and Prosodic Information for the Efficient Detection of Empty Categories</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 An HI'sG Analysis of German Clause Structure </SectionTitle> <Paragraph position="0"> \[\[PSG makes crucial use of &quot;head traces&quot; to analyze the verb-second (V2) phenomenon pertinent in German, i.e. the fact that finite verbs appear in second position in main clauses but in final position in subordinate clauses, as exemplified in (la) and (lb).</Paragraph> <Paragraph position="1"> 1. (a) Gestern reparierte er den Wagen.</Paragraph> <Paragraph position="2"> (Yesterday fixed he the car) 'Yesterday, he fixed the car.' (b) Ich dachte, dab er gestern den Wagen reparierte.</Paragraph> <Paragraph position="3"> (I thought that he yesterday the car fixed) 'I thought that he fixed tile car yester null day'.</Paragraph> <Paragraph position="4"> Following (Kiss&Wesche, 1991) we assume that the structural relationship between tile verb and its arguments and modifiers is not affected by the position of the verb. The overt relationship between the verb 'reparierlc' and its object 'den Wa. .qe~,'in (1t)) is preserved in (la), although the verb shows up in a different position. The apparent contradiction is resolved by assuming an empty clement which serves as a substitute for tile verb ill second position. The empty element fills tile po-sition occupied by the finite verb in subordinate '/\]j clauses, leading to the structure of main clauses exemplified in (2).</Paragraph> <Paragraph position="6"> (2): Syntax tree for 'Gestern reparierte er den Wagen.' The empty verbal head in (2) carries syntactic and semantic information. Particularly, the empty head licenses the realization of the syntactic arguments of the verb according to the rule schemata of German and Ih'sG's Subcategorization Principle.</Paragraph> <Paragraph position="7"> The structure of the main clause presented in (2) can be justifed on several grounds. In particular, the parallelism in verbal scope between verb final and V2 clauses - exemplified in (3a) and (3b) - can be modeled best by assuming that the scope of a verb is always determined w.r.t, the final position. null 3. (a) Ich glaube, du sollst nicht tgten.</Paragraph> <Paragraph position="8"> (I believe you shall not kill) 'I believe you should not kill.' (b) Ich glaube, dab du nicht tgten sollst.</Paragraph> <Paragraph position="9"> (I believe that you not kill shall) 'I believe that you should not kill.' In a V2 clause, the scope of the verb is determined with respect to the empty verbal head only. Since the structural position of an empty verbal head is identical to the structural position of an overt finite verb in a verb final clause, the invariance does not come as a surprise.</Paragraph> <Paragraph position="10"> Rather than exploring alternative approaches here, we will briefly touch upon the representation of the dependency in terms of lIPs(~'s featu~ ral architecture. Information pertaining to empty heads are projected along the DOUBLI,; SI,ASH (DsL) feature instead of the SLASh feature (cf. (Borsley, 1989)). The empty head is described in (4) where the LOCAL value is coindexed with the l)sl, value.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> SYNSEM LOC NONLOC I I)SL } </SectionTitle> <Paragraph position="0"> (4): Feature description of a head trace The DsL of a head is identical to the I)sL of the mother, i.e. l)sb does not behave like a NONLO-CAt, but like a IlEal) feature.</Paragraph> <Paragraph position="1"> A DSL dependency is bound if the verbal projection is selected by a verb in second position. A lexical rule guarantees that the selector shares all relevant information with the Dsb value of the selected verbal projection. The relationship between a verb in final position, a verb in second position and the empty head can be summarized as follows: For each final finite verb form, there is a corresponding finite verb form in second position which licenses a verbal projection whose empty head shares its LOCAL information with the corresponding final verb form. It is thus guaranteed that the syntactic arguments of the empty head are identical to the syntactic arguments required by the selecting verb.</Paragraph> </Section> <Section position="5" start_page="0" end_page="72" type="metho"> <SectionTitle> 3 Processing Empty Elements </SectionTitle> <Paragraph position="0"> Direct parsing of empty elements can become a tedious task, decreasing the efficiency of a system considerably.</Paragraph> <Paragraph position="1"> Note first, that a reduction of empty elements in a grammar in favor of disjunctive lexical representations, as suggested in (Pollard&Sag, 1994, ch.9), cannot be pursued.</Paragraph> <Paragraph position="2"> (Pollard&Sag, 1994) assume that an argument may occur on the SUBCAT or on the SLAS\]I list. A lexical operation removes the argument from Sur~cA'r and puts it onto SI,AStt. Hence, no further need for a syntactic representation of empty elements emerges. This strategy, however, will not work for head traces because they do not occur as dependents on a SUBCAT list.</Paragraph> <Paragraph position="3"> If empty elements have to be represented syntactically, a top-down parsing strategy seems better suited than a bottom-up strategy. Particularly, a parser driven by a bottom-up strategy has to hypothesize the presence of empty elements at every point in the input.</Paragraph> <Paragraph position="4"> In lh's(~, however, only very few constraints are available for a top-down regime since most information is contained in lexical items. The parser will not restrict the stipulation of empty elements until a lexical element containing restrictive information has been processed. The apparent advantage of top-down parsing is thus lost when llpsGs are to be parsed. The same criticism applies to other parsing strategies with a strong top-down orientation, such as left corner parsing or head corner parsing.</Paragraph> <Paragraph position="5"> We have thus chosen a bottom-up parsing strategy where the introduction of empty verbal heads is constrained by syntactic and prosodic information. The syntactic constraints build on the facts that a) a verb trace will occur always to the right of its licenser and b) always 'lower' in the syntax tree. Furthermore c) since the l)sh percolation mechanism ensures structure sharing between the verb and its trace, a verb trace always comes with a corresponding overt verb.</Paragraph> <Paragraph position="6"> As a consequence of c) the parser has a fully specified verb form - although with empty phonology - at hand, rather than having to cope with the underspecified structure in (4). This form can be determined at compile time and stored in the lexicon together with the corresponding verb form. It is pushed onto the trace stack whenever this verb is accessed.</Paragraph> <Paragraph position="7"> Although a large number of bottom-up hypotheses regarding the position of an empty element can be eliminated by providing the parser with the aforementioned information, the number of wrong hypotheses is still significant.</Paragraph> <Paragraph position="8"> In a verb-2nd clause most of the input follows a finite verb form so that condition a) indeed is not very restrictive. Condition b) rules out a large number of structures but often cannot prevent the stipulation of traces in illicit positions. Condition c) has the most restrictive effect in that the syntactic potential of the trace is determined by that of the corresponding verb.</Paragraph> <Paragraph position="9"> If the number of possible trace locations could be reduced significantly, the parser could avoid a large number of subanalyses that conditions a)-c) would rule out only at later stages of the derivation. The strategy that will be advocated in the remainder of this paper employs prosodic information to accomplish this reduction.</Paragraph> <Paragraph position="10"> Empty verbal heads can only occur in the right periphery of a phrase, i.e. at a phrase boundary. The introduction of empty arcs is then not only conditioned by the syntactic constraints mentioned before, but additionally, by certain requirements on the prosodic structure of the input. It turns out, then, that a fine-grained prosodic classification of utterance turns, based on correlations between syntactic and prosodic structure is not only of use to determine the segmentation of a turn, but also, to predict which positions are eligible for trace stipulation. The following section focuses on the prosodic classification schema, section 5 features the results of the current experiments. null</Paragraph> </Section> <Section position="6" start_page="72" end_page="74" type="metho"> <SectionTitle> 4 Classifying Prosodic Information </SectionTitle> <Paragraph position="0"> The standard unit of spoken language in a dialogue is the turn. A turn like (5) can be composed out of several sentences and subsentential phrases -- free elements like the phrase 'ira April' which do not stand in an obvious syntactic relationship with the surrounding material and which occur much more often in spontaneous speech than in other environments. One of the major tasks of a prosodic component of a processing system is the determination of phrase boundaries between these sentences and free phrases.</Paragraph> <Paragraph position="1"> 5. Im April. Anfang April bin ich in Urlaub.</Paragraph> <Paragraph position="2"> Ende April habe ich noch Zeit.</Paragraph> <Paragraph position="3"> (In April beginning April am I on vacation end April have I still time) 'In April. I am on vacation at the beginning of April. I still have time at the end of April.' In written language, phrase boundaries are often determined by punctuation, which is, of course, not available in spoken discourse. For the recognition of these phrase boundaries, we use a statistical approach, where acoustic-prosodic features are classified, which are computed from the speech signal.</Paragraph> <Paragraph position="4"> The classification experiments for this paper were conducted on a set of 21 human-human dialogs, which are prosodically labelled (cf. (Reyelt, 1995)). We chose 18 dialogs (492 turns, 36 different speakers, 6996 words) for training, and 3 dialogs for testing (80 turns, 4 different speakers, 1049 words).</Paragraph> <Paragraph position="5"> The computation of the acoustic-prosodic features is based oi1 a time alignment of the phoneme sequence corresponding to the spoken or recognized words. To exclude word recognition errors, for this paper we only used the spoken word sequence thus simulating 100% word recognition. The time alignment is done by a standard hidden Markov model word recognizer. For each syllable to be classified the following prosodic features were computed fully automatically from the speech signal for the syllable under consideration and for the six syllables in the left and the right context: * the normalized duration of the syllable nucleus null * the minimum, maximum, onset, and offset of fundamental frequency (FO) and the maximum energy and their positions on the time axis relative to the position of the actual syllable null * the mean energy, and the mean FO * flags indicating whether the syllable carries the lexical word accent or whether it is in a word final position The following features were computed only for the syllable under consideration: * the length of the pause (if any) preceding or succeeding the word containing the syllable * the linear regression coefficients of the F0contour and the energy contour computed over 15 different windows to the left and to the right of the syllable This amounts to a set of 242 features, which so far achieved best results on a large database of read speech; for a more detailed account of the feature evaluation, (cf. (Kief~ling, 1996)).</Paragraph> <Paragraph position="6"> The full set of features could not be used due to the lack of sufficient training data. Best results were achieved with a subset of features, containing mostly durational features and F0 regression coefficients. A first set of reference labels was based on perceptive evaluation of prosodically marked boundaries by non-naive listeners (cf. (Reyelt, 1995)). Here, we will only deal with major prosodic phrase boundaries (B3) that correspond closely to the intonational phrase boundaries in the ToBI approach, (cf. (Beckman~Ayers, 1994)), vs. all other boundaries (no boundary, minor prosodic boundary, irregular boundary). Still, a purely perceptual labelling of the phrase boundaries under consideration seems problematic. In particular, we find phrase boundaries which are classified according to the perceptual labelling although they did not correspond to a syntactic phrase boundary. Illustrations are given below, where perceptually labelled but syntactically unmotivated boundaries are denoted with a vertical bar.</Paragraph> <Paragraph position="7"> 6. (a) Sollen wir uns dann im Monat MPSr~. \[ einmal treffen? (Shall we us then in month March meet) 'Should we meet then in March.' Guided by the assumption that only the boundary of the final intonational phrase is relevant for the present purposes, we argue for a categorial labelling (cf. (Feldhaus&Kiss, 1995)), i.e. a labelling which is solely based on linguistic definitions of possible phrase boundaries in German. Thus instead of labelling a variety of prosodic phenomena which may be interpreted as boundaries, the labelling follows systematically the syntactic phrasing, assuming that the prosodic realization of syntactic boundaries exhibits properties that can be learned by a prosodic classification algorithm. null The 21 dialogues described above were labelled according to this scheme. For the classification reported in the following, we employ three main labels, $3+ (syntactic boundary obligatory), S3(syntactic boundary impossible), and $3? (syntactic boundary optional). Table 1 shows the correspondence between the $3 and B3 labels (not taking turn-final labels into account).</Paragraph> <Paragraph position="8"> labels in %.</Paragraph> <Paragraph position="9"> Multi-layer perceptrons (MLP) were trained to recognize $3+ labels based on the features and data as described above. The MLP has one output node for $3+ and one for $3-. During training the desired output for each of the feature vectors is set to one for the node corresponding to the reference label; the other one is set to zero. With this method in theory the MLP estimates posteriori probabilities for the classes under consideration. However, in order to balance for the a priori probabilities of the different classes, during training the MLP was presented with an equal number of feature vectors from each class. For the experiments, MLPs with 40/20 nodes in the first/second hidden layer showed best results.</Paragraph> <Paragraph position="10"> For both $3 and B3 labels we obtained overall recognition rates of over 80% (cf. table 2).</Paragraph> <Paragraph position="11"> Note, that due to limited training data, errors in F0 computation and variabilities in the acoustic marking of prosodic events across speakers, dialects, and so on, one cannot expect an error free detection of these boundaries.</Paragraph> <Paragraph position="12"> Table 2 shows the recognition results in percent for the $3+/$3- classifier and for the B3/not-B3 classifier using the S3-positions as reference (first column) again not counting turn final boundaries. For example, in the first row the number 24 means that 24% of the $3+ labels were classified as $3-, the number 75 means that 75% of the $3+ labels were classified as B3.</Paragraph> <Paragraph position="13"> What table 2 shows, then, is that syntactic $3 boundaries can be classified using only prosodic information, yielding recognition rates comparable to those for the recognition of perceptually identified B3 boundaries. This means for our purposes, that we do not need to label boundaries perceptually, but can instead employ an approach as the one advocated in (Feldhaus&Kiss, 1995), using only the transliterated data. While this system turned out to be very time-consuming when applied to larger quantities of data, (Batliner et al., 1996) report on promising results applying a similar but less labor-intensive system.</Paragraph> <Paragraph position="14"> It has further to be considered that the recognition rate for perceptual labelling contained those cases where phrase boundaries have been recognized in positions which are impossible on syntactic grounds-el, the number of cases in table (1) where a $3- position was classified as B3 and vice versa.</Paragraph> <Paragraph position="15"> It is important to note, that this approach does not take syntactic boundaries and phonological boundaries to be one and the same thing. It is a well-known fact that these two phenomena often are orthogonal to each other. However, the question to be answered was, can we devise an automatic procedure to identify the syntactic bound- null aries with (at least) about the same reliability as the prosodic ones? As the fgures in table (2) demonstrate the answer to this question is yes.</Paragraph> <Paragraph position="16"> Our overall recognition rate of 84.5% for the S3-classifier (cf. table (2)) cannot exactly be compared with results reported in other studies because these studies were either based on read and carefully designed material, (cf., e.g., (Bear&Price, 1990), (Ostenhof&Veilleux, 1994)), or they used not automatically computed acoustic-prosodic features bait textual and perceptual information, (cf.</Paragraph> <Paragraph position="17"> (Wang&Hirschberg, 1992)).</Paragraph> </Section> class="xml-element"></Paper>