File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1008_intro.xml
Size: 17,451 bytes
Last Modified: 2025-10-06 14:06:31
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1008"> <Title>Time Mapping with Hypergraphs</Title> <Section position="2" start_page="0" end_page="59" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The interface between a word recognizer and language processing modules is a crucial issue with modern speech processing systems. Given a sufficiently high word recognition rate, it suffices to transmit the most probable word sequence from the recognizer to a subsequent module (e.g. a parser). A slight extension over this best chain mode would be to deliver n-best chains to improve language processing results.</Paragraph> <Paragraph position="1"> However, it is usually not enough to deliver just the best 10 or 20 utterances, at least not for reasonable sized applications given todays speech recognition technology. To overcome this problem, in most current systems word graphs are used as speech-language interface. Word graphs offer a simple and efficient means to represent a very high number of utterance hypotheses in a extremely compact way (Oerder and Although they are compact, the use of word graphs leads to problems by itself. One of them is the current lack of a reasonable measure for word graph size and evaluation of their contents (Amtrup et al., 1997). The problem we want to address in this paper is the presence of a large number of almost identical word hypotheses. By almost identical we mean that the start and end vertices of edges differ only slightly.</Paragraph> <Paragraph position="2"> Consider figure 1 as an example section of a word graph. There are several word hypotheses representing the words und (and) and dann (then). The start and end points of them differ by small numbers of frames, each of them 10ms long. The reasons for the existence of these families of edges are at least twofold: * Standard HMM-based word recognizers try to start (and finish) word models at each individual frame. Since the resolution is quite high (10ms, in many cases shorter than the word onset), a word model may have boundaries at several points in time.</Paragraph> <Paragraph position="3"> * Natural speech (and in particular spontaneously produced speech) tends to blur word boundaries. This effect is in part responsible for the dramatic decrease in word recognition rate, given fluent speech as input in contrast to isolated words as input. Figure 1 demonstrates the inaccuracy of word boundaries by containing several meeting points between und and dann, emphasized by the fact that both words end resp. start with the same consonant.</Paragraph> <Paragraph position="4"> Thus, for most words, there is a whole set of word hypotheses in a word graph which results in several meets between two sets of hypotheses. Both facts are disadvantageous for speech processing: Many word edges result in a high number of lexical lookups and basic operations (e.g. bottom-up proposals of syntactic categories); many meeting points between edges result in a high number of possibly complex operations (like unifications in a parser).</Paragraph> <Paragraph position="5"> The most obvious way to reduce the number of neighboring, identically labeled edges is to reduce the time resolution provided by a word recognizer (Weber, 1992). If a word edge is to be processed, the start and end vertices are mapped to the more coarse grained points in time used by linguistic modules and a redundancy check is carried out in order to prevent multiple copies of edges. This can be easily done, but one has to face the drawback on introducing many more paths through the graph due to artificially constructed overlaps. Furthermore, it is not simple to choose a correct resolution, as the intervals effectively appearing with word onsets and offsets change considerably with words spoken. Also, the introduction of cycles has to be avoided.</Paragraph> <Paragraph position="6"> A more sophisticated schema would use interval graphs to encode word graphs. Edges of interval graphs do not have individual start and end vertices, but instead use intervals to denote the range of applicability of an edge. The major problem with interval graphs lies with the complexity of edge access methods. However, many formal statements shown below will use interval arithmetics, as the argument will be easier to follow.</Paragraph> <Paragraph position="7"> The approach we take in this paper is to use hypergraphs as representation medium for word graphs. What one wants is to carry out operations only once and record the fact that there are several start and end points of words. Hypergraphs (Gondran and Minoux, 1984, p. 30) are generalizations of ordinary graphs that allow multiple start and end vertices of edges.</Paragraph> <Paragraph position="8"> We extend the approach of H. Weber (Weber, 1995) for time mapping. Weber considered sets of edges with identical start vertices but slightly different end vertices, for which the notion family was introduced. We use full hypergraphs as representation and thus additionally allow several start vertices, which results in a further decrease of 6% in terms of resulting chart edges while parsing (cf. section 3). Figure 2 shows the example section using hyperedges for the two families of edges. We adopt the way of dealing with different acoustical scores of word hypotheses from Weber.</Paragraph> <Paragraph position="9"> of edges 2 Word Graphs and Hypergraphs As described in the introduction, word graphs consist of edges representing word hypotheses generated by a word recognizer. The start and end point of edges usually denote points in time. Formally, a word graph is a directed, acyclic, weighted, labeled graph with distinct root and end vertices. It is a quadruple G = (V, g, YV,/:) with the following components: * A nonempty set of graph vertices Y -{vl,...,Vn}. To associate vertices with points in time, we use a function t : 1) > N that returns the frame number for a given vertex.</Paragraph> <Paragraph position="10"> * A nonempty set of weighted, labeled, directed edges g = {el,...,em} C_ V x ~2 x 14) x E. To access the components of an edge e = (v, v', w, l), we use functions a, ~3, w and l, which return the start vertex (~(e) = v), the end vertex (/~(e) = v'), the weight (w(e) = w) and the label (l(e) = l) of an edge, respectively.</Paragraph> <Paragraph position="11"> * A nonempty set of edge weights ~ -- null {wi,...,wp}. Edge weights normally represent a the acoustic score assigned to the word hypothesis by a HMM based word recognizer.</Paragraph> <Paragraph position="12"> * A nonempty set of Labels PS = {tl,... ,lo}, which represents information attached to an edge, usually words.</Paragraph> <Paragraph position="13"> We define the relation of teachability for vertices (--r) as Vv, w E V : v --+ w ~ 3e E $ : = v ^ = w The transitive hull of the reachability relation ---r is denoted by -~.</Paragraph> <Paragraph position="14"> We already stated that a word graph is acyclic and distinctly rooted and ended.</Paragraph> <Section position="1" start_page="56" end_page="56" type="sub_section"> <SectionTitle> 2.1 I-Iypergraphs </SectionTitle> <Paragraph position="0"> Hypergraphs differ from graphs by allowing several start and end vertices for a single edge. In order to apply this property to word graphs, the definition of edges has to be changed. The set of edges C becomes a nonempty set of weighted, labeled, directed hyperedges $ = {el,...,em} C_ V*\O x V*\O x W x PS.</Paragraph> <Paragraph position="1"> Several notions and functions defined for ordinary word graphs have to be adapted to reflect edges having sets of start and end vertices.</Paragraph> <Paragraph position="2"> * The accessor functions for start and end vertices have to be adapted to return sets of vertices. Consider an edge e = (V, V', w,/), then we redefine</Paragraph> <Paragraph position="4"> * Two hyperedges e, e' are adjacent, if they share a common vertex:</Paragraph> <Paragraph position="6"> * The reachability relation is now Vv, w E )2 : v-+ w ~ 9e e $ : v e a(e) ^w e ~(e) Additionally, we define accessor functions for the first and last start and end vertex of an edge. We recur to the association of vertices with frame numbers, which is a slight simplification (in general, there is no need for a total ordering on the vertices in a word graph) 1. Furthermore, the intervals covered by start and end vertices are defined.</Paragraph> <Paragraph position="8"/> <Paragraph position="10"> In contrast to interval graphs, we do not require the sets of start and end vertices to be contiguous, i.e. there may be vertices that fall in the range of the start or end vertices of an edge which are not members of that set. If we are not interested in the individual members of a(e) or ~(e), we merely talk about interval graphs.</Paragraph> </Section> <Section position="2" start_page="56" end_page="56" type="sub_section"> <SectionTitle> 2.2 Edge Consistency </SectionTitle> <Paragraph position="0"> Just like word graphs, we demand that hypergraphs are acyclic, i.e. Vv -5, w : v # w.</Paragraph> <Paragraph position="1"> In terms of edges, this corresponds to Ve : t(a>Ce)) <</Paragraph> </Section> <Section position="3" start_page="56" end_page="58" type="sub_section"> <SectionTitle> 2.3 Adding Edges to Hypergraphs </SectionTitle> <Paragraph position="0"> Adding a simple word edge to a hypergraph is a simplification of merging two hyperedges bearing the same label into a new hyperedge. Therefore we are going to explain the more general case for hyperedge merging first. We analyze which edges of a hypergraph may be merged to form a new hyperedge without loss of linguistic information. This process has to follow three main principles: * Edge labels have to be identical * Edge weights (scores) have to be combined to a single value * Edges have to be compatible in their start and end vertices and must not introduce cycles to the resulting graph Simple Rule Set for Edge Merging Let ei, e2 E E be two hyperedges to be checked for merging, where el = (V1, VI', wt, 11) and e2 = (V2, V~, w2,/2). Then el and e2 could be merged into a new hyperedge e3 = (V3, V~, w3,/3) iff</Paragraph> <Paragraph position="2"> el and e2 have to be removed from the hypergraph while e3 has to be inserted.</Paragraph> <Paragraph position="3"> Sufficiency of the Rule-Set Why is this set of two conditions sufficient for hyperedge merging? First of all it is clear that we can merge only hyperedges with the same label (this is prescribed by condition 10). Condition 11 gives advice which hyperedges could be combined and prohibits cycles to be introduced in the hypergraph. An analysis of the occuring cases shows that this condition is reasonable. Without loss of generality, we assume</Paragraph> <Paragraph position="5"> This is the case where either the start vertices of el and the end vertices of e2 or the start vertices of e2 and end vertices of el overlap each other. The merge of two such hyperedges of this case would result in a hyperedge e3 where t(a>(e3)) )> t(~< (e3)).</Paragraph> <Paragraph position="6"> This could introduce cycles to the hypergraph. So this case is excluded by condition 11.</Paragraph> <Paragraph position="7"> 2. aD(el ) n ~B(e2) = O A a\[l(e2 ) C/3 ;3\[\](el) = O: This is the complementary case to 1.</Paragraph> <Paragraph position="8"> (a) t(a<(e2)) >_ t(~>(el)) This is the case where all vertices of hyperedge el occur before all vertices of hyperedge e2 or in other words the case where two individual independent word hypotheses with same label occur in the word graph. This case must also not result in an edge merge since ~H(el) C_ \[t(a<(el)),t(a>(e2))\] in the merged edge. This merge is prohibited by condition 11 since all vertices of ~(el) have to be smaller than all vertices of a(e2).</Paragraph> <Paragraph position="10"> This is the complementary case to (a).</Paragraph> <Paragraph position="12"> 2Examples for the scorejoin operation are given later in the paragraph about score normalization.</Paragraph> <Paragraph position="13"> is required (e2 contains the last end vertex).</Paragraph> <Paragraph position="14"> ii. t(a<(el)) < t(;3>(e2)) This is the complementary case to i. As a result of the empty intersections and the cases (b) and ii we get t(c~>(el)) < t(~<(e2)) and t(oz>(e2)) < t(~<(el)). That is in other words Vta E a0(el ) U</Paragraph> <Paragraph position="16"> t~ and just the case demanded by condition 2.</Paragraph> <Paragraph position="17"> After analyzing all cases of merging of intersections between start and end vertices of two hyperedges we turn to insertion of word hypotheses to a hypergraph. Of course, a word hypothesis could be seen as interval edge with trivial intervals or as a hyperedge with only one start and one end vertex. Since this case of adding an edge to a hypergraph is rather easy to depict and is heavily used while parsing word graphs incrementally we discuss it in more detail. null</Paragraph> <Paragraph position="19"> The speech decoder we use delivers word hypotheses incrementally and ordered by the time stamps of their end vertices. For practical reasons we further sort the start vertices with equal end vertex of a hypergraph by time. Under this precondition we get the cases shown in figure 3.</Paragraph> <Paragraph position="20"> The situation is such that eg is a hyperedge already constructed and el -e5 are candidates for insertion.</Paragraph> <Paragraph position="22"> of a matching hyperdege. In practice this is not needed and we could check a smaller amount of vertices. We do this by introducing a maximal time gap which gives us advice how far (in measures of time) we look backwards from the start vertex of a new edge to be inserted into the hypergraph to determine a compatible hyperedge of the hypergraph.</Paragraph> <Paragraph position="23"> It is not possible to add el and e2 to the hyperedge eg since they would introduce an overlap between the sets of start and end vertices of the potential new hyperedge. The resulting hyperedges of adding e3 -- e5 are depicted below. Score Normalization Score normalization is a necessary means if one wants to compare hypotheses of different lengths. Thus, edges in word graphs are assigned normalized scores that account for words of different extensions in time. The usual measure is the score per .frame, which is computed Score per word by taking Length of word in frames&quot; When combining several word edges as we do by constructing hyperedges, the combination should be assigned a single value that reflects a certain useful aspect of the originating edges. In order not to exclude certain hypotheses from consideration in score-driven language processing modules, the score of the hyperedge is inherited from the best-rated word hypothesis (cf. (Weber, 1995)). We use the minimum of the source acoustic scores, which corresponds to a highest recognition probability.</Paragraph> <Paragraph position="24"> Introducing a Maximal Time Gap The algorithm depicted in figure 4 can be speeded up for practical reasons. Each vertex between the graph root and the start vertex of the new edge could be one of the start vertices It is possible to introduce additional paths into a graph by performing time mapping. Consider fig. 5 as an example. Taken as a normal word graph, it contains two label sequences, namely a-c-d and b-c-e. However, if time mapping is performed for the edges labelled c, two additional sequences are introduced: a-c-e and b-c-d. Thus, time mapping by hypergraphs is not information preserving in a strong sense. For practical applications this does not present any problems. The situations in which additional label sequences are introduced are quite rare, we did not observe any linguistic difference in our experiments.</Paragraph> </Section> <Section position="4" start_page="58" end_page="59" type="sub_section"> <SectionTitle> 2.4 Edge Combination </SectionTitle> <Paragraph position="0"> Besides merging the combination of hyperedges, to construct edges with new content is an important task within any speech processing module, e.g. for parsing. The assumption we will adopt here is that two hyperedges el, e2 E PS may be combined if they are adjacent, i.e. they share a common vertex: /~(el) N ol(e2) ~ 0. The label of the new edge en (which may be used to represent linguistic content) is determined by the component building it, whereas the start and end vertices are determined by</Paragraph> <Paragraph position="2"> This approach is quite analogous to edge combination methods for normal graphs, e.g. in chart parsing, where two edges are equally required to have a meeting point. However, score computation for hyperedge combination is more difficult. The goal is to determine a score per frame by selecting the smallest possible score under all possible meeting vertices. It is derived by examining all possible connecting vertices (all elements of I := f~(el) CI a(e2)) and computing the resulting score of the new edge: If w(el) < w(e2), we use w(e,~) := WC~l)(t< -t(~> (~1)))+~(~2).(t(~> (e2))-t<) tC~>(e2))-t(~>(~l)) where t< = min{t(v)lv e I}. If, on the other hand, w(el) > w(e2), we use</Paragraph> <Paragraph position="4"> where t> = max{t(v)\[v e I}.</Paragraph> </Section> </Section> class="xml-element"></Paper>