File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/e91-1027_metho.xml
Size: 12,427 bytes
Last Modified: 2025-10-06 14:12:38
<?xml version="1.0" standalone="yes"?> <Paper uid="E91-1027"> <Title>THE RECOGNITION CAPACITY OF LOCAL SYNTACTIC CONSTRAINTS</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. Background: Local Syntactic Constraints </SectionTitle> <Paragraph position="0"> Let S = Wi,..., W be a sentence of length N, {Wi} being the words composing the sentence.</Paragraph> <Paragraph position="1"> And let ti ..... t be a tag image corresponding to the sentence S, {ti} belonging to the tag set T the set of word-class tags used as terminal symbols in a given grammar G. Typically, M=N, but in a more general environment we allow M > N . This is useful when dealing with languages where morphology allows cliticization, concatenation of conjunctions, prepositions, Or determiners to a verb or a noun, etc.; in grammars for l lebrew, for example, it is convenient to assume that a preliminary morphological phase separated word-forms to basic sequences of tags, and then state syntactic rules in terms of standard word classes.</Paragraph> <Paragraph position="2"> In any case, it is reasonable to assume that the tag image it ..... IM cannot be uniquely assigned. Fven with a coarse tag set (e.g. parts of speech with no features) many words have more than one interpretation, thus giving rise to exponentially many tag images for a sentence. 3 Following \[Karlsson 90\], we use the term cohort to refer to the set of lcxicaUy valid readings of a given word. We use the term path to refer to a sequence of M tags (M~ N) which is a tag-image corresponding to the words W,..., WN of a given sentence S. This is motivated by a view of lexical mnbiguity as a graph problem: we try to reduce the number of tentative paths in ambiguous cases by removing arcs from the Sentence Graph (SG) - a directed graph with vertices for all tags in all cohorts of the words in the given sentence, and arcs connecting each tag to ~dl tags in the cohort which follows it.</Paragraph> <Paragraph position="3"> The removal of arcs and the testing of paths for validity as complete sentence interpretations are done using local constraints. A local constraint of length k on a given tag t is a rule allowing or disaUowing a sequence of k tags from being in its right (or left) neighborhood in any tag image of a sentence. In our approach, the local constraints are extractcd from the grammar (and this is the major aspect distinguishing it from some other short context methods such as \[Beale 881, \[DeRose 88\], \[Karlsson 90\], \[Katz 851, \[Marcus 80\], \[Marshall 831, and \[Milnc 861).</Paragraph> <Paragraph position="4"> For technical convenience we add the symbol &quot;$ <&quot; at the beginning of tag images and &quot;> $~ at the etad. Given a grammar G (wlfich for the time being we assume to be an unrestricted context-free phrase structure grammar), with a:set T of terminal symbols (tag set), a set V of variables (non-terminals, among which S is the root vailable for derivations), and a set P of production rules of the form A --. a, where A is in V and a is in (VUT)* , we define the Right Short Context of length k of a terminal t (tag): SCr (t,k) for t in T and for k = 0,1,2,3...</Paragraph> <Paragraph position="5"> tz I z ~ T*, Izl=k or Izl < k if &quot;> $' is the last tag in z, and there exists a derivation S = > atz// (a,//~ (V U T)* ) The l.eft Short Context of length k of a tag t relative to the grammar G is denoted by SCI (t,k) and defined in; a similar way.</Paragraph> <Paragraph position="6"> It is sometimes useful to define Positional Short Contexts. The definition is similar to the above, with a restriction that t may start only in a given position in a tag image of a sentence.</Paragraph> <Paragraph position="7"> The basis for the automaton Which checks a tag stream (path) for validity as a tag-image relative to the local constraints, is the function next(t), which for any t in T defines a set, as follows: : next (t) = { z I tz E SCr (t,l) } In \[Ilerz/Rimon 911 we gave a procedure for computing next(t) from a given context free grammar, using standard practices of parsing of formal languages (see \[Aho/Ulhnan 72\]).</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3. Local Constraints Automata </SectionTitle> <Paragraph position="0"> We denote by LCA(I) the simple finite state automaton which uses the pre-processed {next(t)} sets to check if a given tag stream (path) satisfies the SCr(t,l) constraints.</Paragraph> <Paragraph position="1"> In a similar: manner it is possible to define LCA(k), relative to the short context of length k. We denote by L the language generated by the</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Our studies of modern written ! lebrew suggest that about 60% of the word-forms in running texts are ambiguous </SectionTitle> <Paragraph position="0"> with respect to a basic tag set, and the :average number of possible readings of such word-forms is 2.4. Even when counting only &quot;natural readings', i.e. interpretations which are likely to occur in typical corpora, this number is quite large, around 1.8 (it is somewhat larger for the small subset of the most common words).</Paragraph> <Paragraph position="1"> underlying grammar, and by L(k) the language accepted by the automaton LCA(k). The following relations hold for the family of automata (LCA(i)}: L(I) _~ L(2) _~ ... ~ L &quot;llfis guarantees a security feature: If for some i, I.CA(i) does not recognize (accept) a string of tags, then this string is sure to be illegM (i.e. not in 1.). On the other hand, any LCA(k) may recognize sentences not in L (or, from a dual point of view, will reject only part of the illegal tag images). The important question is how tight are the inclusion relations above - i.e. how well LCA(k) approximates the language I.. in particular we are interestcd in LCA(I).</Paragraph> <Paragraph position="2"> There is no simple analytic answer to tiffs question. Contradictory forces play here: the nature of the language -- c.g a rigid word order and constituent order yield stronger constraints; the grain of the tag set -- better refined tags (different languages may require different tag sets) help express refined syntactic claims, hence more specific constraints, but they &quot;also create a greater level of tagging ambiguity; the size of the grammar -- a larger grammar offers more information, but, covering a richer set of structures, it * allows more tag-pairs to co-occur; etc.</Paragraph> <Paragraph position="3"> It is interesting to note that for l lebrew, short context methods are most needed because of the considerable ambiguity at the lexical level, but their cll~:ctiveness suffers from the rather free word/constituent order.</Paragraph> <Paragraph position="4"> Finally, a comment about the computational efficiency of the LCA(k) automaton. The time complexity of checking a tag string of length n using I,CA(k) is at most O(n x k x loglTI), while a non-deterministic parser for a context free grmntnar may require O(n3x IGI2). (IT\] is the size of the tag set, IGI is the size of the grammar). The space complexity of l,CA(k) is proportionM to \]7\] k/~ ; this is why otfly truly short contexts should be used.</Paragraph> <Paragraph position="5"> Note that for a sentence of length k, the power of LCA(k) is idcnticM to the weak generative capacity of the full underlying grammar. But since the size of sentences (tag sequences) in L is unbounded, there is no fixed k which suffices.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 4. A Sample Grammar </SectionTitle> <Paragraph position="0"> To illustrate claims made in the sections below, we will use the following toy grammar of a small fragment of English. Statements about the correctness of sentences etc., are of course relative to this toy grammar.</Paragraph> <Paragraph position="1"> The tag set T includes: n (noun), v (verb), det (determiner), adj ( adjective ) and prep (preposition). The context free grammar G is:</Paragraph> <Paragraph position="3"> To extract the local constraints from this grammar, we first compute the function next(t) for every tag t in T, and from the resulting sets we obtain the graph below, showing valid pairs in the short context of length 1 (again, validity is relative to the given toy grammar): >$ This graph, or more conveniently the table of &quot;valid neighbors&quot; below, define the LCA(I) automaton. The table is actually the union of the SCr(t,l) sets for all t in T, and it is derived directly from the graph: $< det adj n prep adj $< adj v det prep n $< n v adj n prep det adj v n n v det n prep det n >$ - 157 5. A &quot;Lucky Bag&quot; Experiment Consider the following sentence, which is in the language gcncratcd by grammar G of section 4: (1) Thc channing princess kissed a frog. The unique tag image corresponding to this sentence is: \[ $ <, dot, adi, n, v, det, n, > $ \]. Now let us look at the 720 &quot;random inputs&quot; generated by permutations of the six words in (i), and the set of corresponding tag images. Applying I.CA(I), only two tag images are rccog~.ed as valid: \[ $ <, det, adj, n, v, det, n, >$ \], and \[ $<, dct, n, v, dot, adj, n, >$ \]. These are exactly the images corresponding to the eight syntactically correct sentences (relative to G), (la-b) The/a charming princess kissed a/the frog. (lc-d) The/a chamfing frog kissed a/the princess. (lc-t') The/a princess kissed a/the charming frog. (lg-h) The/a frog kissed a/the charming princess. This result is not surprising, given the simple scntence and toy grammar. (In general, a grammar with a small number of rules relative to the size of the tag set cannot produce too many valid short contexts). It is therefore interesting to examine another example, where each word is associated with a cohort of several interpretations. We borrow from \[llcrz/Rimon 9.1\]: (2) All old people like books about fish.</Paragraph> <Paragraph position="4"> Assuming the word tagging shown in section 6, there are 256 (2 x 2 x 2 x 4 x 2 x 2 x 2) tentative tag hnages (paths) for this sentence and for each of its 5040 permutations. This generates a very htrge number of rather random tag images.</Paragraph> <Paragraph position="5"> Applying LCA(I), only a small number of hnages are rccogtfizcd as potentially valid. Among them are syntactically correct sentences such as: (2a) Fish like old books about all people. ,and only less than 0.1% sentences which are locally valid but globally incorrect, such as: (2b) * Old tish all about books like people. (tagged as \[$ <, n, v, n, prep, n, v, n, > $\]). These two examples do not suggest any kind of proof, but they well illustrate the recognition power of even the least powerful automaton in the {LeA(i)} family. To get another point of view, one may consider the simple formal language L consisting of the strings {ar&quot;b m} for I < rn, which can be generated by a context-free grammar (} over T = {a, b}. I.CA(I) based on (; will recognize all strings of the form (a'b ~} for 1 <j,k, but none of the very many other strings over T. It can be shown that, given arbitrary strings of length n over T, the probability that LeA(I) will not reject strings not belonging to L is proportional to n/2&quot;, a term which tends rapidly to 0. This is the over-recognition margin. 6. Use of LeA in Conjunction with a</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Parser </SectionTitle> <Paragraph position="0"> The number of potentially valid tag images (paths) for a given sentence can be exponential in the length of the sentence if all words are ambiguous. It is therefore desirable to filter out invalid tag images before (or during) parsing.</Paragraph> <Paragraph position="1"> To examine the power of LCAs as a pre-parsing fdter, we use example (2) again, demonstrating lexical ambiguities as shown in the chart below.</Paragraph> <Paragraph position="2"> The chart shows the Reduced Sentence Graph (RSG) - the original SG from which invalid arcs (relative to the SCr(t,l) table) were removed.</Paragraph> </Section> </Section> class="xml-element"></Paper>