File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/94/c94-2138_abstr.xml
Size: 12,827 bytes
Last Modified: 2025-10-06 13:48:04
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-2138"> <Title>A Reestimation Algorithm fi~r I'robabilistic Recto'sire ~lYansition Network*</Title> <Section position="1" start_page="0" end_page="863" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> an elevated version of t{51'N to model and process lan-.</Paragraph> <Paragraph position="1"> guages in stoch~st, ic parameters. The representation is a direct derivation front the H,TN and keeps much the spirit of ltidden Markov Model at the same tint(,.</Paragraph> <Paragraph position="2"> We present a reestimation algorithm \['or Ptl,TN that is ~ variation of Inside-Ontside algorithm that comput, es the vMues of the probabilistic parameters from sample sentences (parsed or unparsed).</Paragraph> <Paragraph position="3"> 1. lntrodu(:tion In this pal)er , we introduce a network representation, Probabilistic Recursive Transitio. Network that is directly derived fl'Oln R'CN and ItMM, and present an estimation algorithm lot tile probabilistic paraHteters. PR;12N is a \]\[\]TN mJgmented with probabilities in the transitions ~md states and with the lexical distributions in the transi-tions, or is the Hidden Markov Model augmented with a stack that makes some traltsitions deter ministic.</Paragraph> <Paragraph position="4"> The paramete.r esthnation of PI{;I'N is devel oped as a wu'iation of Inside()utside algorithm.</Paragraph> <Paragraph position="5"> The hlsidc ()utside algorithm has becn applied e(,10t, I;o ~, ,.~* recently by Jelinek (1{t9{/) and \],ari (1991). The algorithm was first introduced by Baker in 1.979 and is the context free lmtguage version o\[ Forward-.Backw~rd algorithm in IIid-.</Paragraph> <Paragraph position="6"> *This research is partly supported by KOSEF (Km:ea Science altd Teclntology l&quot;oundation) under tit{= title &quot;A Study mt the Bnilding '\[~echni(lues for \[txdmst Km~wledge based Systems&quot; from 19911 through 1994.</Paragraph> <Paragraph position="7"> den Markov Models. Its theoretical lbund~Ltion is laid by Baam aud Weh:h in the late 6l)'s, which in tarn is a type of the F,M Mgorithm in statistics (Rabiner, 1989).</Paragraph> <Paragraph position="8"> Kupiec (1991) introduced a trellis based es-.</Paragraph> <Paragraph position="9"> timation Mgorithm of Hidden SCFG that ae commodates both ilnside-Outside ~dgorithm and l!brward-.Backward &quot;,flgorithm. The meaning of our work can be sought from the use of more plain topology of I{TN, whereas Kupiec's work is a unilied version of tbrward-.backword and Inside Outside ~lgorithms. Nonetheless, the implemen.</Paragraph> <Paragraph position="10"> ration of reestimation Mgorittun carries no more theoretical significance than the applicative efli ciency and variation for differing representations since B~ker first apt)lied it to CI&quot;Gs.</Paragraph> <Paragraph position="11"> probabilities, ~tnd 13 is aiL observation matrix containing probability distribution of the words ob servable at each terminM transition where row and column correspond to terminM transitions and a list of words respective, ly. F specilies the types of transitions, and D2 denotes a stack. The first two model parameters are the same as that of I\[MM, thus typed transitions and the existence of a stack art', what distinguishes I'ttTN fl'om t\[MM. The stack operations are associated with transitions. According to the stack operation, transitions are classified into three types. The first type is push transition in which state identification is pushed into the stack. The second type is pop transition which is selected by the content of stack. Transitions of the third type are not committed to stack operation. The three types are also accompanied by different grammatical implication, hence grammatical categories are assigned to trartsitions except pop transitions. Push transitions are associated with nonterminal categories, and will be called nonterminal transition when it is more transparent in later discussions. In general, the grammar expressed in PRTN consists of layers. A layer is a fragment of network that corresponds to a nonterminal. The third type of transition is linked to the category of terminals (words), titus is named terminal transition. Also a table of probability distribution of words is defined on each terminal transition. In the context of HMMs, tile words in the terminal transition are observations to be generated. Pop transitions represent returning of a layer to one of its possibly multiple higher layers.</Paragraph> <Paragraph position="12"> The network topology of PI~TN is not different fi-om that of RTN. In a conceptual drawing of a grammar, each layer looks like an independent network. Compared with conceptual drawing of the network, an operational view provides more vivid representation in which actual paths or parses are composed. The only difference between the two is that in operational view a nonterminal transition is connected directly to the first state of the corresponding layer. In this paper, the parses or paths are assumed to be sequences of dark-headed transitions (see Fig. I for example). null Before we start explaining the algorithms let us define some notations. There is one start state denoted by 8, and one final state denoted by f. Also let us ca\]\] states immediately following a terminal transition terminal state, and states at which pop transitions are defined pop state. Some more notations are as follows.</Paragraph> <Paragraph position="13"> states i and j.</Paragraph> <Paragraph position="14"> * Wa~ b is an observation sequence covering from ath to bth observations.</Paragraph> <Paragraph position="15"> 3. Reestilnation Algorithm PRTN is a RTN with probabilistic transitions and words 1 that can be estimated from sample sentences by means of statistical techniques, we present a reestimation algorithm for obtaining the probabilities of transitions and the observation symbols (words) defined at each terminal transition. Inside-Outside algorithm provides a formal basis for estimating parameters of context free languages such that the probabilities of the observation sequences (sample sentences ) are maximized. The reestimation algorithm iteratively estimates the probabilistic parameters until the probability of sample sentence(s) reaches a certain stability. The reestimation algorithm for PItTN is a variation of Inside-Outside algorithm customized for the representation. The algorithm to be discussed is defined only for well formed observation sequences.</Paragraph> <Paragraph position="16"> Definition 1 An observation sequence is well formed if there exists at least a path that generates the sequence in the network and starts at S and ends at 2:'.</Paragraph> <Paragraph position="17"> Let an obserw~tion sequence of length N denoted by W- W~W~...Wu.</Paragraph> <Paragraph position="18"> We start explaining the reestimation Mgorithm by defining Inside-probability.</Paragraph> <Paragraph position="19"> The Inside probability denoted by PI(i)s~t of state i is the probability that a portion of layer(i) (front state i to the last state of the layer) gen erattes W;~t. Thatt is, it is the probatbility thatt a certain fragment of a layer generates at certain segment of an input sentence, and this can be computed by summing the probabilities of all the possible paths in the layer segment that generate the given input segment.</Paragraph> <Paragraph position="20"> where c = last(layer(i)).</Paragraph> <Paragraph position="21"> More constructive re.presentation of Inside prob atbility is then</Paragraph> <Paragraph position="23"> '\].'he paths starting at state i arc classilied into two cases according to the type of hnmedi~te transi-tion fl'om i: it can be of terminal or nonterminal type, In ease of terminal, ~J'ter the probatbility of the terminal transition is taken into account, the rest of the layer segment is responsible for the input segment short of one word just generated by the terminM tratnsition, in caase of nontmminM, first the transition probabilities (push and respective pop tratnsitions) atre multiplied, then depending on the coverage of the nonterminal transition (sublatyer) the rest of the current latyer is responsible for the rmnaining input sequence after done by the sublaycr. After the last observation is made, the last state (pop state) of layer(i) should be</Paragraph> <Paragraph position="25"> Fig. 2 is the pi('toriM view of the Inside probability. A well formed sequence can begin oidy at state ,S, thus to be strict, t~(5) has additional product term F(,5) that can be computed also using InsideOutside algorithm. Now define the Outside probability.</Paragraph> <Paragraph position="26"> The Outside probatbility denoted by Po(i, j).,~~. is the probatbility thatt patrtial sequences, Wl~.,q and Wt+1~N, are generated provided that the partiatt sequence, Ws~t, is generated by \[i,j\] given ruodel, A. This is a complementary point of Inside-probability. This time, we look at the outside of given layer segme,tt and input segment.</Paragraph> <Paragraph position="27"> Assunfing a given latyer segment generates a given input segment, we want to colnpute the probatbility that the surrounding portion of the whole I'R:i'N generates the rest of the input sequence.</Paragraph> <Paragraph position="29"> The Outside probability is computed first by considering the current layer consisting of two parts after' excluding \[i,j\] that are captured in Inside-probability. Beyond the current layer is simply an Outside probability with respect to the current layer.</Paragraph> <Paragraph position="31"> Fig. 3 shows the network configuration in computing the Outside probability, t'~(f,i)=~~_t is the probability that sequence, W=~~I, is generated by layer(i) left to state i. PI(j)t+l~b is the probability that sequence Wt+l~b is generated by layer(i) right to state j. The portions of W not covered by W=~b is then left to the parent layers of layer(i).</Paragraph> <Paragraph position="32"> P~(f, i).,~t is a slight wriation of Inside probability in which PI(f)=~b'S in the Inside probability formula are replaced by P~(f, i)a~b. \[ts actual computation is done as follows:</Paragraph> <Paragraph position="34"> x represents a state from which layer(i) branches out, and y represents a state to which layer(j) returns to. Every time a different combination of left and right sequences with respect to W~~t is tried in the layer states i and j belong to, the rest of remaining sequences is the Outside probability at the layer above layer(i).</Paragraph> <Paragraph position="35"> When there is no subsequence to the right of</Paragraph> <Paragraph position="37"> It is basically the same as Inside probability except that it carries a state identification i to check the vMidity of stop state. If there are observations left for generation (s _< t), things are done just as in computing Inside probability, ignoring i. When boundary point is reached (s > t), if the last state is i, it returns 1, and 0, otherwise.</Paragraph> <Paragraph position="38"> The probability of an observation sequence can be computed using Inside probability ~s</Paragraph> <Paragraph position="40"> Now we can derive the reestimation algorithm for Ji and/~ using the Inside and Outside probabilL ties. As the result of constrained maximization of Bantu's auxiliary function, we have the following form of reestimation for each transition (Rabiner 1989).</Paragraph> <Paragraph position="41"> expected no. of transitions from i to j d~j = expected no. of transitions front i The expected frequency is defined for each of the thre(, types of transition. For a terminal transition, null</Paragraph> <Paragraph position="43"> l<,jc,,,(~)- l~y~,,(0, u'~ is a nonterminM transitiolt . Considering that tr~msitions of terminal and nonterminM types can occur together at a state, the reestim~tion \['or terminal tr~msitions is done as follows: 'file reestimation process co~ltinues until the probability of the observation sequences reaches a certain stability. It is not nnusuM to assume that the tra.iHing set can be very large, and even grow indefinitely in non trivial applications in which case additive traini~tg c~n be tried using a smoothing tectmiquc as in (Jarre and I'ieraccini \] 987). The complexity of \[Itside-Outside ~dgorithm is O(N a) both in the mnnber of states and input length (l~ari 1990). The ei\[iciency comes from the fact that the algorithm successfully exploits the context-freeness, l!br instance, the ge~mration of substrings by a nonterminal is independent of tit(; surroundings of the .aonterminal, and this is \]tow the product of the Inside and Outside probabili ties works and the COlnplexity is derived.</Paragraph> </Section> class="xml-element"></Paper>