File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-1027_metho.xml

Size: 18,313 bytes

Last Modified: 2025-10-06 14:12:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-1027">
  <Title>Compiling and Using Finite-State Syntactic Rules</Title>
  <Section position="4" start_page="88" end_page="88" type="metho">
    <SectionTitle>
@8 the DEF ART 8/
</SectionTitle>
    <Paragraph position="0"> program V PRES NON-SG3 8FINV 8MAINV 0 run N NOM PL 8PREDC 8@ This one is very ungranmmtlcal, though. It will be the task of the rule component to exclude such, and leave only the grammatical one(s) intact:</Paragraph>
  </Section>
  <Section position="5" start_page="88" end_page="88" type="metho">
    <SectionTitle>
88 the DEF ART 8
</SectionTitle>
    <Paragraph position="0"> program N NOM SG 8SUBJ @ run V PRES SG3 8FINV 8MAINV 88 Note that in this framework, the parsing does not build any neW structures. The granu-natieal reading Is already present in the input representation. null</Paragraph>
    <Section position="1" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
1.2 The role of rules
</SectionTitle>
      <Paragraph position="0"> The task for the rules here is (as is the ease with the CG approach by Karlsson) to: * exclude those interpretations of ambiguous words which are not possible in the current sentence, * choose the correct type of boundaries between each two words, and * detern~Ine which syntactic tags are the appropriate ones.</Paragraph>
      <Paragraph position="1"> Rules should preferably express meaningful constraints which result in the exclusion of all ungramnmtical alternatives. Each rule should thus be a grammatical statement which effectively forbids certain tag combinations. Rules in the CG formalism are typically dedicated for one of the above tasks, and they are executed as successive groups.</Paragraph>
      <Paragraph position="2"> In finite-state syntax, rules are logically unordered. Furthermore, In order to achieve word level disambiguation, one typically uses rules which describe the occurrences of boundaries and syntactic tags in grammat/ca//y correct structures rather than indicating how the incorrect interpretations can be identified.</Paragraph>
      <Paragraph position="3"> Thus, the three effects are achieved, eve** ff individual finite-state rules cannot be classified into corresponding three groups.</Paragraph>
    </Section>
    <Section position="2" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
1.3 Rule automata
</SectionTitle>
      <Paragraph position="0"> Finite-state rules are represented using regular expressions and they are transformed into finite-state automata by a rule compiler.</Paragraph>
      <Paragraph position="1"> The whole finite-state grammar consists of a set of rules which constrain the possible choices of word Interpretations, tags and boundaries to</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="88" end_page="88" type="metho">
    <SectionTitle>
ACRES DE COLING-92, NAN'D!S, 23-28 AO6T 1992
</SectionTitle>
    <Paragraph position="0"> only those which are considered grammatical.</Paragraph>
    <Paragraph position="1"> The entire grammar Is effectively equivalent to the (theoretical) intersection of all individual rule automata. However, such an intersection would be impractical to compute due to Its huge size.</Paragraph>
    <Paragraph position="2"> The logical task for any finite-state parser in the current approach is to compute the intersection of the unanalyzed sentence automaton and each rule automaton. Actual parsing can be done in several alternative ways which are guaranteed to yield the same result, but which vary in terms of efficiency.</Paragraph>
    <Paragraph position="3"> 2. The finite-state rule formalism Tapanainen (1991 ) has implemented a compiler and a parser for finite-state gramnmrs. The compilation and the parsing is based on a Common Lisp finite-state program package written by him. Tapanainen also reports in his Master's thesis (1991) new methods for optimizing the result of the compilation and improving the speed of parsing.</Paragraph>
    <Paragraph position="4"> The current rule compiler has only few built-in rules or definitions. Instead, It has a formalism for defining relevant expressions and new rule types. There are two types of definitions for this purpose. The first one defines a constant regular expression which can later on be referred to by its name:</Paragraph>
    <Paragraph position="6"> Some basic notations are defined in this way such as the dot which stands for a sequence of tokens within a single word:</Paragraph>
    <Paragraph position="8"> The backslash '\' denotes any sequence of tokens not containing occurrences of its argument (which here lists all types of word and clause boundaries). A variation of the dot is a dot-dot'..&amp;quot; which represents a sequence of tokens within the same clause:</Paragraph>
    <Paragraph position="10"> The second type of definitions has parameters, and it can be used for expressions which vary according to their values: name(paranb, .., param,) - expressionl The expression is a regular expression formulated using constant terms and the parameter symbols param i. An example of this type of definitions is the loll@wing which requires every clause to be of a given form X:</Paragraph>
    <Paragraph position="12"> The formula forbids subsequences which are clauses but not of form x (the middle term is easier to understand as \[ -x &amp; .. \]).</Paragraph>
    <Paragraph position="13"> Experience with writing actual large scale grammars within the finlte-state framework has indicated that we need more flexibility in defining rules than what was first expected.</Paragraph>
    <Paragraph position="14"> This flexibility is achieved by having one very general rule format: expressionl The expression simply defines a constraint for all sentences, ie. it is already as such equivalent to a rule automaton, Forbidding unwanted combinations or sequences, such as two finite verbs within the same clause, can be excluded cg. by a rule:</Paragraph>
  </Section>
  <Section position="7" start_page="88" end_page="88" type="metho">
    <SectionTitle>
UNIQUE (FINV)
</SectionTitle>
    <Paragraph position="0"> Here, UNIQUE Is a definition which has been made using the formalisms above, and is available for the grammar writer. Using the UNIQUE definition, one can express general principles, such as that there is at most one main verb, at most one subject etc. in each clause.</Paragraph>
    <Paragraph position="1"> Most of the actual rules still use the right arrow format: expression -&gt; left-context _ right-context; All three parts of the rules are regular expressions. The rule requires that any occurrence of expression must be surrounded by the given context.</Paragraph>
    <Paragraph position="2"> 3. English finite-state grammar The English finite-state grammar discussed here was written by Voutilainen. The grammar itself is much more comprehensive than what can be described in this paper. Although the grammar already covers most of the areas of English grammar that it is intended to cover, it is still far from complete in details. The grammar, when complete, will be part of Voutilainen's PhD dissertation (forthcoming). This section presents some general principles from that grammar, and a few examples from more complex phenomena.</Paragraph>
    <Section position="1" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
3.1 Goals of the grammar
</SectionTitle>
      <Paragraph position="0"> The present grammar has many goals and characteristics similar to those of the SIMPR Constraint Granmmn * the ability to parse unrestricted running texts with a large dictionary, * concrete, surface-orlented description in terms of dependency syntax.</Paragraph>
      <Paragraph position="1"> The current finite-state syntax uses, indeed, the same ENGTWOL lexicon as the SIMPR CG syntax (Karlsson et al. 1991). The set of syntactic features are adopted from the CG description almost as such with a few addltions. null In the present finite-state approach, however, we aim at: * more general and linguistically motivated rules (fewer, more powerful and general rules in the grammar), * more accurate treatment of Intrasentential structure (three types of clause boundaries instead of one), and * a satisfactory description of certain complex constructions and sentence structures.</Paragraph>
      <Paragraph position="2"> The present formalism can achieve somewhat more general and powerful rules than the current CG formalism through tile use of full regular expression notation.</Paragraph>
    </Section>
    <Section position="2" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
3.2 Clause boundaries
</SectionTitle>
      <Paragraph position="0"> Some power and accuracy is gained through a commitment to use a notation for clause boundaries which is exact in defining when words belong to the same or a different clause. The two formalisms are equivalent in many cases: @@ The dog chased a cat @/which ate the mouse @@ The more elaborate clause boundary marking makes a difference in case of center-embedding: null @@ The man @&lt; who came first @&gt; got the job @@ This convention indicates that there are two clauses: The man .. got the job .. who came first ..</Paragraph>
    </Section>
    <Section position="3" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
3.3 Constituent structure
</SectionTitle>
      <Paragraph position="0"> Head-modifier relations are expressed (here and in the CG) with tags, eg.:  The head of a NP is tagged as a major constituent, here as a subject. In case the constituent is a coordinated one, each of the coordinated head gets the same tag: John's N GEN @GN&gt; brother N NOM SG @SUBJ and COORD @CC aunts N NOM PL @SUBJ The genitival attribute O&gt;GN modifies at least the next noun (brother) but possibly also some further ones at the same level of coordination (aunts).</Paragraph>
      <Paragraph position="1"> Aclms DE COLING-92, NANTES, 23-28 AOUT 1992 1 5 8 Foot. OF COLING-92, NANTES, AUG. 23-28, 1992</Paragraph>
    </Section>
    <Section position="4" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
3.4 An example
</SectionTitle>
      <Paragraph position="0"> Let us consider the following (classical) sentence null Time flies like an arrow.</Paragraph>
      <Paragraph position="1"> The input to the finite-state syntax comes from the ENGTWOL morphological analyzer wlth some modifications and extensions in the sets of features associated with words:</Paragraph>
      <Paragraph position="3"/>
      <Paragraph position="5"> This small sample sentence representation contains some 21 million readings.</Paragraph>
      <Paragraph position="6"> Each syntactic-function label starts with S. Many of the common labels like ~SUBJ have been replaced by the combination of ~SUBJ/-F and 8SUBJ to reflect the distinction of subjects of non-finite constructions from those of the main verb. A similar distinction is made in the verbal entries.</Paragraph>
      <Paragraph position="7"> The grammar is committed to exclude only those readings which are ungrammatical.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="88" end_page="88" type="metho">
    <SectionTitle>
ACRES DE COLING-92, NANTF~, 23-28 AO6T 1992
</SectionTitle>
    <Paragraph position="0"> Thus, several readings may pass the rules, in thls case, the following six: i. 88 time N NOM SG 8t~&gt;</Paragraph>
    <Section position="1" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
3.5 Overview of rules
</SectionTitle>
      <Paragraph position="0"> The finite-state grammar for English consists of some 200 rules dedicated for several areas of the grammar: * Internal structure of nominal and non-finite verbal phrases. The structure is described as head-modifier relations, including determiners, premodiflers and postmodiflers.</Paragraph>
      <Paragraph position="1">  * CoordinaUon at various levels of the grammar. null * Surface-syntactlc functions of nominal  phrases.</Paragraph>
      <Paragraph position="2"> The structure of noun phrases is described using two approaches together. A coarse structure is fixed with the mechanism of deflnIUons. It would not be feasible to use that mechanism alone (because it would lead to a context-free descripUon). The deflniUons are supplemented with ordinary finite-state rules which enforce further restrictions.</Paragraph>
      <Paragraph position="3"> 1 5 9 PROC. OF COLING-92, NANTES, AUO. 23-28, 1992</Paragraph>
    </Section>
    <Section position="2" start_page="88" end_page="88" type="sub_section">
      <SectionTitle>
3.6 Non-finite Constructions
</SectionTitle>
      <Paragraph position="0"> Between the level of the nominal phrase and the finite clause, there is an Intermediary level, that of non-finite t~nsmtct/ons (see Quirk &amp; el.</Paragraph>
      <Paragraph position="1"> 1985). These constructions resemble noun phrases when seen as parts of the surrounding clause because they act eg. as subjects, objects, preposition complements, etc., postmodifiers, or adverbials, eg.:  She was fond of (singing in the dark}.</Paragraph>
      <Paragraph position="2"> The dog (barking in the corridor} was irritable. ('fired by her journey}, she fell asleep.</Paragraph>
      <Paragraph position="3"> Internally, non-finite constructions are like finite clauses because the main verb of a non-finite construction can have subjects, objects, adverbials etc. of Its own.</Paragraph>
      <Paragraph position="4"> Both finite and non-finite constructions have a verbal skeleton, which in a finite construction starts with aJO~e verb and ends with the first main verb. The finite verbal skeletons In the following examples are underlined: Shs sinas.</Paragraph>
      <Paragraph position="5"> Will she ~? She would no t have been singinq unless ..</Paragraph>
      <Paragraph position="6"> A non-finite verbal skeleton starts with certain kinds of non-finite verb (to+infinitive. present participle, past participle, non-finite auxiliary) and ends with the first main verb to the right: It is easy lode it.</Paragraph>
      <Paragraph position="7"> ~red by her journey, she went into her room. They knew it all, ~ there before.</Paragraph>
      <Paragraph position="8"> Non-finite verb chains do not contain center-embedded verbs, whereas a non-finite construction can be center-embedded within a finite verb chain only ff it is (a part off a nominal phrase: Can \[shooting hunters} be dangerous? Can men (shooting hunters} be dangerous? The use of syntactic tags instead of a hierarchical tree-structure forces us to a very fiat description of sentences. This might result in problems when describing clauses with non-finite constructions with a small set of tags, eg.: The boy \[kicking @MAINV\] the \[ball @OBJ\] \[saw @MAINV\] the \[cow @OBJ\].</Paragraph>
      <Paragraph position="9"> A useful concept in clause-level syntax is the uniqueness principle. We wlsh to say, for Instance, that In a clause, there is at most one l. The~ is another way to interpret this sentence without any non-finite constructions by including 'to come' in the finite verb chain. We have adopted the current interprctation in order to achieve certaing linguistle generallzaUona.</Paragraph>
      <Paragraph position="10"> (possibly co-ordinated) subject, object, or predicate complement. Uniqueness holds for the finite clause, and each non-finite construction separately, and this will be very difficult to formulate, ff we use same tags for both domains (as in the above example).</Paragraph>
      <Paragraph position="11"> The syntactic tags as given In the finite-state version of ENGTWOL capitalize heavily on non-finite constructions in order to overcome this problem: The boy \[kicking @MAINV/-F\] the (ball @OBJ/-F\] \[saw @MAINV\] the \[cow @OBJ\].</Paragraph>
      <Paragraph position="12"> Here, the object in the non-finite construction is furnished with a label different from the corresponding label used in the finite construction, so there is no risk of confusion between the two levels.</Paragraph>
      <Paragraph position="13"> The duplication of certain labels for certain categories Increases the amount of ambiguity, but, on the other hand, the new ambiguity seems to be of a fairly controllable type. The description of non-finite constructions boils down to two subtasks. One is to express constraints on the Internal structure of non-finite constructions; the other, the control on their distribution.</Paragraph>
      <Paragraph position="14"> In terms of verb chain and constituent structure, non-finite constructions resemble finite constructions. Their main difference is that word order in non-finite constructions is much more rigid.</Paragraph>
      <Paragraph position="15"> We proceed with some examples of rules describing non-finite constructions. An infinitive acting as main verb in a non-finite construction is preceded by to acting as an Infinitive marker or by a subject of a non-finite phrase or by a co-ordinated infinitive.</Paragraph>
      <Paragraph position="16"> So we wish. for instance, the following utterances to be accepted: He wants \[to @INFMARK&gt;\] \[go INF @-FMAINV/-F\]. She saw \[her @SUBJ/-F\] \[go INF @-FMAINV/-F\]. She saw \[her @SUBJ/-F\] \[come INF @-FMAINV/-F\] and \[go INF @-FMAINV/-F\]. The constraint is expressed as a rule:</Paragraph>
      <Paragraph position="18"> Items preceded by an exclamation mark are constant definitions, t /-f signals any constituent that can occur In a postverbal position in a non-finite construction.</Paragraph>
      <Paragraph position="19"> A past participle as a main verb in a non-finite construction must always be preceded by an appropriate klnd of auxiliary or clause boundmy. null Acr~ DE COLING-92, NANTES, 23-28 ^OI~T 1992 1 6 0 Pave. OF COLING-92, NANTES, AUG. 23-28, 1992 For example: \[Having @-FAUXW-FJ \[gone PCP2 @-FMAINV/-F\] home, they rested.</Paragraph>
      <Paragraph position="20"> This constraint corresponds to a rule:</Paragraph>
      <Paragraph position="22"> \[lI~r:l.m-aux/-f I lclb\] taffy1* -_ I There are further rules for the distribution of non-finite constructions with present participles, etc. Further rules have been written for the description of the Internal structure of non-finite constructions which, in turn, is fairly straight-forward. The overall experience Is that a fairly adequate description of these types of phenome~m can be achieved by the set of syntactic tags proposed above accompanied by a manageable set of finite-state rules.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML