File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/a00-2033_intro.xml
Size: 3,837 bytes
Last Modified: 2025-10-06 14:00:41
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-2033"> <Title>Removing Left Recursion from Context-Free Grammars</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A long-standing issue regarding algorithms that manipulate context-free grammars (CFGs) in a &quot;topdown&quot; left-to-right fashion is that left recursion can lead to nontermination. This is most familiar in the case of top-down recursive-descent parsing (Aho et al., 1986, pp. 181-182). A more recent motivation is that off-the-shelf speech recognition systems are now available (e.g., from Nuance Communications and Microsoft) that accept CFGs as language models for constraining recognition; but as these recognizers process CFGs top-down, they also require that the CFGs used be non-left-recursive.</Paragraph> <Paragraph position="1"> The source of the problem can be seen by considering a directly left-recursive grammar production such as A -4 As. Suppose we are trying to parse, or recognize using a speech recognizer, an A at a given position in the input. If we apply this production top-down and left-to-right, our first subgoal will be to parse or recognize an A at the same input position. This immediately puts us into an infinite recursion. The same thing will happen with an indirectly left-recursive grammar, via a chain of subgoals that will lead us from the goal of parsing or recognizing an A at a given position to a descendant subgoal of parsing or recognizing an A at that position.</Paragraph> <Paragraph position="2"> In theory, the restriction to non-left-recursive CFGs puts no additional constraints on the languages that can be described, because any CFG can in principle be transformed into an equivalent non-left-recursive CFG. However, the standard algorithm for carrying out this transformation (Aho et al., 1986, pp. 176-178) (Hopcroft and Ullman, 1979, p. 96)--attributed to M. C. Panll by Hopcroft and Ullman (1979, p. 106)--can produce transformed grammars that are orders of magnitude larger than the original grammars. In this paper we develop a number of improvements to Panll's algorithm, which help somewhat but do not completely solve the problem. We then go on to develop an alternative approach based on the left-corner grammar transform, which makes it possible to remove left recursion with no significant increase in size for several grammars for which Paull's original algorithm is impractical.</Paragraph> <Paragraph position="3"> 2 Notation and Terminology Grammar nonterminals will be designated by &quot;low order&quot; upper-case letters (A, B, etc.); and terminals will be designated by lower-case letters. We will use &quot;high order&quot; upper-case letters (X, Y, Z) to denote single symbols that could be either terminals or nonterminals, and Greek letters to denote (possibly empty) sequences of terminals and/or nonterminals. Any production of the form A --4 a will be said to be an A-production, and a will be said to be an expansion of A.</Paragraph> <Paragraph position="4"> We will say that a symbol X is a direct left corner of a nonterminal A, if there is an A-production with X as the left-most symbol on the right-hand side.</Paragraph> <Paragraph position="5"> We define the left-corner relation to be the reflexive transitive closure of the direct-left-corner relation, and we define the proper-left-corner relation to be the transitive closure of the direct-left-corner relation. A nonterminal is left recursive if it is a proper left corner of itself; a nonterminal is directly left recursive if it is a direct left corner of itself; and a nonterminal is indirectly left recursive if it is left recursive, but not directly left recursive.</Paragraph> </Section> class="xml-element"></Paper>