File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-1005_metho.xml
Size: 10,744 bytes
Last Modified: 2025-10-06 14:12:08
<?xml version="1.0" standalone="yes"?> <Paper uid="C88-1005"> <Title>Efficiency Considerations for LFG-Parsers - Incremental and Table-Lookup Techniques</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 General Considerations of Parsing Effi- </SectionTitle> <Paragraph position="0"> ciency Basic parsing techniques (both shift reduce and recursive descent) seem to be inherently inefficient inasmuch as they proceed strictly according to the sequence of the rules in the grammar and they are not able to exploit the surrounding (preceeding and following) syntactic information. Their scope is limited to a single rule and they jump mechanically to the sequentially next rule, even if such a move is obviously abortive and must be innmdiately abandoned (Winograd 1'483, 108-115; Phillips 1984; Hellwig 1988).</Paragraph> <Paragraph position="1"> Parsing tables - as they are conceived in current compiler construction devices for LR(k) and LL(k) languages - make 1. the izfformation provided by the grammar accessible throughout the entire processing and not just at the point where they happen to occur, and 2. tlmy can be constructed algorittnnically (Aho/Ullman 1979).</Paragraph> <Paragraph position="3"/> </Section> <Section position="3" start_page="0" end_page="26" type="metho"> <SectionTitle> 2 The LFG-Model of the EWH: General </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Design </SectionTitle> <Paragraph position="0"> The Koblenzer LFG-Parser-Generator is an interactive system, designed to create and to test grammars for natural languages according to the linguistic philosophy of the LFG as conceived in Bresnan und Kaplan (1982). Both lexicon and syntax follow closely the original format specifications. The system can be divided into two main phases: preprocessing and actual execotion).</Paragraph> <Paragraph position="1"> 1. Prepro~:essing of the input gralmnar (including lexicon) generates the executable code, which in turn involves two logically distinct steps: * Generating the P~OLOG code and * Optimizing the PP~OLOG code, - and 2. the actual execution phase analyses the input string and produces the f-structures.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Code-Generation </SectionTitle> <Paragraph position="0"> In the preprocessing phase the grammar rules are entered into the system and translated into an executable PB,OLOG Code. This part of the system is written in PASCAL. The implementation includes facilities for the treatment of the metavariables ~ and needed for the treatment of the long distance dependencies (Weisweber 1986). The grammar may contain both optional categories and multiply reoccurring categories (marked by the Kleene-star .-operator).</Paragraph> <Paragraph position="1"> In order to facilitate the generation of the tables with the reach relations the phrase-structure portion of the rules of the gran~nar (c-structure rules) is extracted and stored as azl additional, separate data set.</Paragraph> </Section> <Section position="3" start_page="0" end_page="26" type="sub_section"> <SectionTitle> 2.2 Code Optimization </SectionTitle> <Paragraph position="0"> The second task of the preprocessor is to produce a more efficient PKOLOG code. Optimization covers construction of parsing table and code revision.</Paragraph> <Paragraph position="1"> In order to speed up the actual analysis in the execution phase the preprocessor constructs a table of reach relations on the basis of first and follow sets, connecting nontermlnal and preterminal nodes with a lookahead of 1.</Paragraph> <Paragraph position="2"> The definition of the first and follow sets is based on context free grammar (Aho/Ullman (1979, 186-192,429-30)):</Paragraph> <Paragraph position="4"> ~,jO, E (N E ~,)* and ..4 F_ 31&quot;.</Paragraph> <Paragraph position="5"> The first sets are defined for a non terminal symbol A over a string c~ of preterminals as the potential preterminal symbols which can occur in the leftmost position of the string: F~IRST(a) = {a E Y, I c~==~afl} u {el ,~=~e} The follow sets of a nonterminal A are defined as the first sets of the preterminals which may occur after the nonterminal A: FOLLOW(A) := {a E $ I S=~aA~ A a E FIRST(fl)} u {$1 S:~A*}.</Paragraph> <Paragraph position="6"> Contrary to the standard definition of the terms (op. cit.) the Koblenzer system does not exclude the application to left recursive constructions. The reach relations are build up uniformly both for left recursive and for all other constructions.</Paragraph> <Paragraph position="7"> The first and follow sets allow to define the reach relations, which provide the information for a nonterminals A (in the stack) and for a preterminal symbol (located in the input string a) by which production rule(s) the preterminal can be accessed:</Paragraph> <Paragraph position="9"> The reach relations are valid for all context free languages and extend the applicability of LL(1)-tables for them in general. They are calculeted over the first and follow sets and stored in tables for the execution phase. The practical construction of the table of reach relations is based on the systematic separation of dictionary and grammar rules, without which the construction of the table would not be feasible.</Paragraph> <Paragraph position="10"> There are a number of grammatically predefined f-descriptions, which caa be preprocessed in advance independently of the actual input, reducing the number of unifications at run time. Preliminary unification of f-structures can be carried out in the following configurations: * If an f-description subsumes another f-description, the subsumed f-structure can be regarded as already unified and dropped.</Paragraph> <Paragraph position="11"> In the execution phase the system will Use only the subsuming (i.e. larger) f-description. E.g. if a dictionary entry in the PROLOG code, produced in the preprocessing phase, has the specifications as (TSUBJ NUM) = SO, and simultaneously: !(TSUBJ), the later can be safely dropped in order to avoid the vacuous ratification of the explicit subject in the execution phase.</Paragraph> <Paragraph position="12"> * If an f-description is unified with new attributes, hitherto not used in the grammar, the operation will always succeed, regardless of the actual value of the attributes. Unifications of this type can be carried out safely in advance regardless of later possible changes of the attribute value.</Paragraph> <Paragraph position="13"> * There are further minor possible f-structure configurations which can be simplified before the actual unification in the execution phase. The current optimization will recognize some of these special cases and replace the general unification procedures by specialized and hence more restricted procedures already at the time of code generation. The general broad unification procedures (merge functions) will be substituted here by more specific and computationally less expensive procedures.</Paragraph> </Section> </Section> <Section position="4" start_page="26" end_page="26" type="metho"> <SectionTitle> 3 The Run Time System </SectionTitle> <Paragraph position="0"> Firstly, the run time system can be characterized by the basic separation of lexicon lookup and actual parsing, The separation of lexicon rules and syntactic rules is based on the linguistic insight that the two components (lexicon and grammar) reflect entirely different language properties. The division can be supported also by consideration of processing efficiency.</Paragraph> <Paragraph position="1"> The lexicon lookup is carried out at the beginning of the processing and it immediately allows the rejection of input in case of missing entries in the lexicon. The user can enter another word on the spot and proceed with the processing of the same sentence.</Paragraph> <Paragraph position="2"> The next step is the inspection of the LL(1) tables by means of which the reach relations are established, The table of reach relations provides the optimal subset of grammatical symbols and connects them to the lexlcal entries occurring in the actual input sentence.</Paragraph> <Paragraph position="3"> Secondly, the run time system is characterized by the single-pass strategy of processing, i.e. the input is read in only once, merging two fundamental tasks of the LFG: 1. the constructing of the c-structures and 2. the unification of the f-structures in a single step.</Paragraph> <Paragraph position="4"> A special treatment is necessary for the left recursive constructions. The entries in the LL(1)-table for potential left recursions may be used only as long as the repetion is not spurious, otherwise their further application is suspended. At the time of the processing of phrase structure rules, the associated functional description is processed immediately. At this point the nodes relevant to the functional assignments are easily accessible as the left hand side symbol (for the metavariable T) and the right hand side symbols (for the metavariables ~} in the rules.</Paragraph> <Paragraph position="5"> As the input is processed the f-structure is constructed step by step incrementally. All available attributes and values are merged together as soon as they emerge, which is efficient for at least two reasons: 1. There is no need to store and reprocess the cumulated f-equations in an additional step and 2. merging the f-descriptions incrementally step by step operates with smaller chunks, which implies faster unification.</Paragraph> <Paragraph position="6"> The incremental processing means that at the end of the input sentence the analysis is complete and solved and does not need to be scanned again in order to solve a series of f-equations. There is only one single control operation at the end of the sentence checking the wellformedness (completeness and exhaustiveness) of the output.</Paragraph> <Paragraph position="7"> The single-pass model differs therefore from the Kaplan-Bresnanmodel by lacking a separate processing phase for the cumnlated f-structures following tile generation of c-structures. In fact there is no explicit need for retaining the c-structures, except for their possible display in tutorials and in tracing errenous production, while testing the rules of the input grammar, The current implementation delivers both the c-structure as well as the f-structure of the input sentence. In case of multiple interpretations all c-structures and all valid f-structures are displayed in succession.</Paragraph> </Section> class="xml-element"></Paper>