File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/92/c92-1027_concl.xml

Size: 6,763 bytes

Last Modified: 2025-10-06 13:56:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-1027">
  <Title>Compiling and Using Finite-State Syntactic Rules</Title>
  <Section position="9" start_page="88" end_page="88" type="concl">
    <SectionTitle>
4. Implementation
</SectionTitle>
    <Paragraph position="0"> We need a compiler for transforming the rules written in the finite-state formalism Into finite-state automata, and a parser which first transforms sentences into finite-state networks, and then computes the logical intersection of the rule-automata and the sentence automaton.</Paragraph>
    <Section position="1" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
4.1 Compilation of the rules
</SectionTitle>
      <Paragraph position="0"> The grammar consisting of rules is first parsed and checked for formal errors using a GNU flex and bison parser generator programs.</Paragraph>
      <Paragraph position="1"> The rest of the compilation Is done In Common Lisp by transforming the rules written in the regular expression formalism Into finlte-state automata.</Paragraph>
      <Paragraph position="2"> FuU-seale grammars tend to be large contain-Ing maybe a few hundred finite-state rules. In order to facilitate the parsing of sentences, the compiler tries to reduce the number of rule automata after each rule has been compiled.</Paragraph>
      <Paragraph position="3"> Methods were developed for determining which of the automata should be merged together by intersecting them (Tapanainen 1991). The key idea behind this is the concept of an activation alphabet. Some rule-automata turn out to be irrelevant for certain sentences, simply because the sentences do not contain any symbols (or combinations of symbols) necessary to cause the automaton to fail. Such rule-automata can be ignored when parsing those sentences. Furthermore, It turned out to be a good strategy to merge automata with similar activation alphabets (rather than arbitrary ones, or those resulting in smallest intersections). null</Paragraph>
    </Section>
    <Section position="2" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
4.2 Parsing sentences
</SectionTitle>
      <Paragraph position="0"> The implementation of the parsIng process Is open to many choices which do not change tile results of tile parsing, but which may have a stgnifiemlt effect on the time mid space requirements of the parsing. As a theoretical staxlJi~ point one could take tile following setup.</Paragraph>
      <Paragraph position="1"> Parser A: Assume that we first enumerate all readings of a sentence-automaton. Each readtng Is, In turn, fed to each of the rule-automala. Those readings that are accepted by all ruleautonmta form the set of parses.</Paragraph>
      <Paragraph position="2"> Parser A is clearly Infeasible In practice because of the immense number of readings represented by tile sentence-automaton (millions even in relatively simple sentences, and tile number grows exponentially wllh sentence length).</Paragraph>
      <Paragraph position="3"> A second elementary mad theoretical approach: Parser B: Take the sentence automaton and Intersect with each rule-autonmton In turn.</Paragraph>
      <Paragraph position="4"> This is more feasible, but experiments have shown that the number of states In the intermediate results tends to grow prohibitively large when we work with full scale grmnmars and complex sentences {Tapanainen 1991).</Paragraph>
      <Paragraph position="5"> This is ml Important property of llnite-state automata. All automata Involved are reasonably small, and even tile end result Is very small, but file Intermediate results can be extremely large imore than 100,000 states and beyond the capacity of tile machines and algorithms we have\].</Paragraph>
      <Paragraph position="6"> A fm-ther refinement of the above strategy I3 would be to carefully choose tile order in which the Intersecting Is done: Pcu'ser C: Intersect the rule-automata with the sentence automaton In the order where you first evaluate each of the remaining automata according to how much they reduce the number of readings remaining. &amp;quot;lhe one which makes the greatest reduction is chosen at each step.</Paragraph>
      <Paragraph position="7"> This strategy seems to be feasible but much effort Is spent on the repeated evaluation. It turns out that one nlay even use a one-time estimation for the order: ParserD:. Perform a tentatWe Intersection of the sentence autmnalon and each of the rules first. Then Intersect the rules with the sentence automaton one by one tn the decreasing order of their capacity to reduce the number of readhags from the or~Ttnal sentence automaton.</Paragraph>
      <Paragraph position="8"> We may also choice to operate In parallel Instead of rule by rule: Parser E: Simulate the Intersection of all rules and the sentence automaton by trying to enumerate readings In the sentence automaton but constraining the process by the rule-automata.</Paragraph>
      <Paragraph position="9"> Each tune when a taale rejects the next token Acl~.s DE COL1NG-92, NANVES, 23-28 AO~r 1992 1 6 1 PI~OC. ov COL1NG~92, NANTES. AU(;. 23-28, 1992 proposed, the corresponding branch In the search process is abandoned.</Paragraph>
      <Paragraph position="10"> This strategy seems to work fairly satisfactorily. It was used In the initial stages of the grammar development and testing together with two other principles: * merging of automata into a smaller set of automata during the compflatlon phase using the activation alphabet of each automaton as a guideline * excluding some automata before the parsing of each sentence according to the presence of tokens in the sentence and the activation alphabets of the merged automata.</Paragraph>
      <Paragraph position="11"> Some further improvements were achieved by the following: Parser I~. Manually separate a set of rules defining the coarse clause structure into a phase to be first intersected with the sentence automaton. Then use the strategy E with the remaining rules. The initial step establishes a fairly good approximation of feasible clause boundaries.</Paragraph>
      <Paragraph position="12"> This helps the parsing of the rest of the rules by rejecting many incorrect readings earlier.</Paragraph>
      <Paragraph position="13"> Parsing simple sentences like &amp;quot;time flies like an arrow&amp;quot; takes some 1.5 seconds, whereas the following fairly complex sentence takes some 10 seconds to parse on a SUN SPARCstation2: Nevertheless the number of cases in which assessment could not be related to factual rental evidence has so far not been so great as to render the whole system suspect.</Paragraph>
      <Paragraph position="14"> The sentence automaton Is small in terms of the number of states, but it represents some 10 a5 distinct readings.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML