File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-2215_intro.xml
Size: 5,406 bytes
Last Modified: 2025-10-06 14:06:03
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2215"> <Title>Most Probable Tree in Data-Oriented Parsing and Stochastic Tree Grammars. In Proceedings</Title> <Section position="3" start_page="0" end_page="1175" type="intro"> <SectionTitle> 2 Preliminaries </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="1175" type="sub_section"> <SectionTitle> 2.1 Stochastic Tree-Substltution Grammar (STSG) </SectionTitle> <Paragraph position="0"> STSGs and SCFGs are closely related. STSGs and SCFGs are equal in weak generative cai The author notes that the actual accuracy figures of the experiments listed in (Sima'an, 1995) are much higher than the accuracy figures reported in the paper. The lower figures reported in that paper are due to a test-procedure.</Paragraph> <Paragraph position="1"> pacity (i.e. string languages). This is not the case for strong generative capacity (i.e. tree languages); STSGs can generate tree-languages that are not generatable by SCFGs. An STSG is a five-tuple (VN, VT, S, d, PT), where VN and VT denote respectively the finite set of non-terminal and terminal symbols, S denotes the start non-terminal, C is a finite set of elementary-trees (of arbitrary depth > 1) and PT is a function which assigns a value 0 < PT(t) < 1 (probability) to each elementary-tree t such that for all N EVN: Y\].tee, root(tl=N PT(t) = 1 (where root(t) denotes the root of tree t). An elementary-tree in C has only non-terminals as internal nodes but may have both terminals and non-terminals on its frontier. A non-terminal on the frontier is called an Open-Tree (OT). If the left-most open-tree N of tree t is equal to the root of tree tl then t otl denotes the tree obtained by substituting tl for N in t. The partial function o is called left- null most substitution. A left-most derivation (1.m.d.) is a sequence of left-most substitutions</Paragraph> <Paragraph position="3"> only terminals. The probability P(Imd) is defined as PT(tl) x ...x PT(t~). ~'or convenience, derivation in the sequel refers to 1.m. derivation.</Paragraph> <Paragraph position="4"> A Parse is a tree generated by a derivation. A parse is possibly generatable by many derivations.</Paragraph> <Paragraph position="5"> The probability of a parse is defined as the sum of the probabilities of the derivations that generate it. The probability of a sentence is the sum of the probabilities of all derivations that generate that sentence.</Paragraph> <Paragraph position="6"> A word-graph over the alphabet Q is Q1 x * ..x Qm, whereQiC Q, foralll < i<_ m. We denote this word-graph with Qm if-Qi = Q, for alll< i< m.</Paragraph> </Section> <Section position="2" start_page="1175" end_page="1175" type="sub_section"> <SectionTitle> 2.2 The 3SAT problem </SectionTitle> <Paragraph position="0"> It is sufficient to prove that a problem is NP-hard in order to prove that it is intractable. A problem is NP-hard if it is (at least) as hard as any problem that has been proved to be NP-complete (i.e. a problem that is known to be decidable on a non-deterministic Taring Machine in polynomial-time but not known to be decidable on a deterministic Turing Machine in polynomial-time). To prove that problem A is as hard as problem B, one shows a reduction from problem B to problem A. The reduction must be a deterministic polynomial time transformation that preserves answers.</Paragraph> <Paragraph position="1"> The NP-complete problem which forms our starting-point is the 3SAT (satisfiability) problem.</Paragraph> <Paragraph position="2"> An instance INS of 3SAT can be stated as follows~: Given an arbitrary a Boolean formula in 3-conjunctive normal form (3CNF) over :In the sequel, INS, INS's formula and its symbols refer to this particular instance of 3SAT.</Paragraph> <Paragraph position="3"> 3Without loss of generality we assume that the forthe variables ul,..., un. Is there an assignment of values true or false to the Boolean variables such that the given formula is true ? Let us denote the given formula by C1 A C2 A. * * ACm for ra > 1 where 6'/ represents (dC/1 V dis V dis), for 1 < i < m, 1 < j _< 3, and dij represents a literal uk or ~k for some 1< k< n.</Paragraph> <Paragraph position="4"> Optimization problems are known to be (at least) as hard as their decision counterparts (Garey and Johnson, 1981). The decision problem related to maximizing a quantity M which is a function of a variable V can be stated as follows: is there a value for V that makes the quantity M greater than or equal to a predetermined value m. The decision problems related to disambiguation under DOP can be stated as follows, where G is an STSG, WG is a word-graph, w~isasentence and0 < p < 1: MPPWG Does the word-graph WG have any parse, generatable by the STSG G, that has probability value greater than or equal to p ? MPS Does the word-graph WG contain any sentence, generatable by the STSG G, that has probability value greater than or equal to p ? MPP Does the sentence w~ have a parse generatable by the STSG G, that has probability value greater than or equal to p ? Note that in the sequel MPPWG / MPS / MPP denotes the decision problem corresponding to the problem of computing the MPP / MPS / MPP from a word-graph / word-graph / sentence respectively. null</Paragraph> </Section> </Section> class="xml-element"></Paper>