File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/h93-1047_intro.xml
Size: 3,265 bytes
Last Modified: 2025-10-06 14:05:29
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1047"> <Title>Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1. INTRODUCTION </SectionTitle> <Paragraph position="0"> There has been a great deal of interest of late in the automatic induction of natural language grammar. Given the difficulty inherent in manually building a robust parser, along with the availability of large amounts of training material, automatic grammar induction seems like a path worth pursuing. A number of systems have been built which can be trained automatically to bracket text into syntactic constituents. In \[ 10\] mutual information statistics are extracted from a corpus of text and this information is then used to parse new text. \[13\] defines a function to score the quality of parse trees, and then uses simulated annealing to heuristically explore the entire space of possible parses for a given sentence. In \[3\], distributional analysis techniques are applied to a large corpus to learn a context-free grammar.</Paragraph> <Paragraph position="1"> The most promising results to date have been based on the inside-outside algorithm (i-o algorithm), which can be used to train stochastic context-free grammars. The i-o algorithm is an extension of the finite-state based Hidden Markov Model (by \[1\]), which has been applied successfully in many areas, including speech recognition and part of speech tagging. A number of recent papers have explored the potential of using the i-o algorithm to automatically learn a grammar \[9, 15, 12, 6, 7, 14\].</Paragraph> <Paragraph position="2"> Below, we describe a new technique for grammar induction. 2 *The author would like to thank Mark Liberman, Meiting Lu, David Magerman, Mitch Marcus, Rich Pito, Giorgio Satta, Yves Schabes and Tom Veatch. This work was supported by DARPA and AFOSR jointly under grant No. AFOSR-90-0066, and by A'RO grant No. DAAL 03-89-C0031 PRI.</Paragraph> <Paragraph position="3"> INot in the traditional sense of the term.</Paragraph> <Paragraph position="4"> 2A similar method has been applied effectively in part of speech tagging; The algorithm works by beginning in a very naive state of knowledge about phrase structure. By repeatedly comparing the results of parsing in the current state to the proper phrase structure for each sentence in the training corpus, the system learns a set of ordered transformations which can be applied to reduce parsing error. We believe this technique has advantages over other methods of phrase structure induction.</Paragraph> <Paragraph position="5"> Some of the advantages include: the system is very simple, it requires only a very small set of transformations, learning proceeds quickly and achieves a high degree of accuracy, and only a very small training corpUs is necessary. In addition, since some tokens in a sentence are not even considered in parsing, the method could prove to be considerably more resistant to noise than a CFG-based approach. After describing the algorithm, we present results and compare these results to other recent results in automatic phrase structure induction.</Paragraph> </Section> class="xml-element"></Paper>