File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/92/h92-1027_intro.xml

Size: 4,813 bytes

Last Modified: 2025-10-06 14:05:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="H92-1027">
  <Title>S A NP$ VP VP A V NP$ NP NP VP* ADV I I I I</Title>
  <Section position="3" start_page="0" end_page="140" type="intro">
    <SectionTitle>
1. MOTIVATIONS
</SectionTitle>
    <Paragraph position="0"> Although stochastic techniques applied to syntax modeling have recently regained popularity, current language models suffer from obvious inherent inadequacies. Early proposals such as Markov Models, N-gram models \[1, 2, 3\] and Hidden Markov Models were very quickly shown to be linguistically not appropriate for natural language (e.g. \[4\]) since they are unable to capture long distance dependencies or to describe hierarchically the syntax of natural languages.</Paragraph>
    <Paragraph position="1"> Stochastic context-free grammar \[5\] is a hierarchical model more appropriate for natural languages, however none of such proposals \[6, 7\] perform as well as the simpler Markov Models because of the difficulty of capturing lexical information. The parameters of a stochastic context-free grammar do not correspond directly to a distribution over words since distributional phenomena over words that are embodied by the application of more than one context-free rule cannot be captured under the context-freeness assumption.</Paragraph>
    <Paragraph position="2"> This leads to the difficulty of maintaining a standard hierarchical model while capturing lexical dependencies.</Paragraph>
    <Paragraph position="3"> This fact prompted researchers in natural language processing to give up hierarchical language models in the favor of non-hierarchical statistical models over words (such as word N-grams models). Probably for lack of a better language model, it has also been argued that the phenomena that such devices cannot capture occur relatively infrequently.</Paragraph>
    <Paragraph position="4"> *This work was partially supported by DARPA Grant N0014-9031863, ARO Grant DAAL03-89-C-0031 and NSF Grant IRI90-16592. We thank Aravind Joshi for suggesting the use of TAGs for statistical analysis during a private discussion that followed a presentation by Fred Jelinek du.ring the June 1990 meeting of the DARPA Speech and Natural Language Workshop. We are also grateful to Peter Braun, Fred 3elinek, Mark Liberman, Mitch Marcus, Robert Mercer, Fernando Pereira and Stuart Shieber for providing valuable comments. Such argumentation is linguistically not sound.</Paragraph>
    <Paragraph position="5"> Lexicalized tree-adjoining grammars (LTAG) 1 combine hierarchical structures while being lexically sensitive and are therefore more appropriate for statistical analysis of language. In fact, LTAGs are the simplest hierarchical formalism which can serve as the basis for lexicalizing context-free grammar \[10, 11\].</Paragraph>
    <Paragraph position="6"> LTAG is a tree-rewriting system that combines trees of large domain with adjoining and substitulion. The trees found in a TAG take advantage of the available extended domain of locality by localizing syntactic dependencies (such as filler-gap, subject-verb, verb-object) and most semantic dependencies (such as predicate-argument relationship). For example, the following trees can be found in a</Paragraph>
    <Paragraph position="8"> eats John peanuts hungrily Since the elementary trees of a LTAG are minimal syntactic and semantic units, distributional analysis of the combination of these elementary trees based on a training corpus will inform us about relevant statistical aspects of the language such as the classes of words appearing as arguments of a predicative element, the distribution of the adverbs licensed by a specific verb, or the adjectives licensed by a specific noun.</Paragraph>
    <Paragraph position="9"> This kind of statistical analysis as independently suggested in \[12\] can be made with LTAGs because of their extended domain of locality but also because of their lexicalized property. null In this paper, this intuition is made formally precise by defining the notion of a stochastic lexicalized tree-adjoining grammar (SLTAG). We present an algorithm for computing the probability of a sentence generated by a SLTAG, and finally we introduce an iterative algorithm for estimating the parameters of a SLTAG given a training corpus of text. This algorithm can either be used for refining the parame1 We assume familiarity throughout the paper with TAGs and its lexicalized variant. See, for instance, \[S\], \[9\], \[10\] or \[111.  ters of a SLTAG or for inferring a tree-adjoining grammar from a training corpus.</Paragraph>
    <Paragraph position="10"> Due to the lack of space, in this paper the algorithms are described succinctly without proofs of correctness and more attention is given to the concepts and techniques used for SLTAG.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML