File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/80/c80-1052_metho.xml

Size: 11,851 bytes

Last Modified: 2025-10-06 14:11:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="C80-1052">
  <Title>PARSING AGAINST LEXICAL AMBIGUITY</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
STATE OF THE ART
</SectionTitle>
    <Paragraph position="0"> Marcus \[1977\] showed that a wide range of English grammar could be parsed determinsitca!ly, that is without every making a mistake and having to backtrack. But in Marcus' parser, almost every word was defined as only one part of speech. For example in his parser, &amp;quot;block&amp;quot; could only be a noun, making the following sentence unacceptable to the parser.</Paragraph>
    <Paragraph position="1"> \[I\] Block the road.</Paragraph>
    <Paragraph position="2"> With so little ambiguity, it is not surprising that Marcus's parser could work deterministically. For determinstic parsing to be a serious claim, it must be shown that it is possible to parse determinstically sentences which contain part-of-speech ambiguity. Is deterministic parsing still possible when part of speech ambiguity is included? The answer to this question can be thought of as the first major test for determinsitic parsing. If it is able to handle part-of-speech ambiguity easily, this will be a major reinforcement of the deterministic parsing strategy. If it cannnot handle LA, the theory will collapse.</Paragraph>
    <Paragraph position="3"> The first approach to LA for a deterministic parser was \[Milne 78\]. This work dealt solely with noun/verb ambiguity. When a noun/verb word was discovered, a special packet of rules was activated to decide which part-of-speech the word should be. For example, a typical rule stated that &amp;quot;to&amp;quot; followed by a noun/verb word meant that the noun/verb word was being used as a verb, and would disambiguate it as such. The rest of the grammar dealt with the disambiguated word.</Paragraph>
    <Paragraph position="4"> Although this approach was very effective, the rules were very special case, and many rules would be needed to handle all the possibilities.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE DEFAULT CASE
</SectionTitle>
    <Paragraph position="0"> I have implemented a deterministic parser in Prolog \[Pereira 78\] similar to Marcus' but extended it to allow words to be defined as multiple parts of speech. The parser has appoximately 80% of Marcus' original grammar, but the grammar has been extended to cover the domain of mechanics problems. (MECHO) \[Bundy 79a,79b\].</Paragraph>
    <Paragraph position="1"> To extend the Prolog parser, each word in the dictionary was syntactically defined as all parts-of-speech it could function as, given the grammar. The only other initial modification necessary was to alter the attach function to disambigute the word to the part-of-speech it is being attached as. For example if &amp;quot;block&amp;quot; is attached as a noun, it will be disambiguated to a noun. Because of the expectations of the parser, represented by the packets, and the constraints of neighboring items, represented by the buffer pattern matching, a large number of cases were handled without further modification. For example in the sentence: \[2\] The block is red.</Paragraph>
    <Paragraph position="2"> The parser will be expecting a noun after 350 the determiner, and hence only the rules for nouns in nounphrases will be active. &amp;quot;Block&amp;quot; will be used as a noun, and the verb usage never considered.</Paragraph>
    <Paragraph position="3"> Similary in the case: \[3\] Block the road.</Paragraph>
    <Paragraph position="4"> The rule for Imperative at the sentence start will match off the verb features of &amp;quot;block&amp;quot;, and the noun usage will not be considered.</Paragraph>
    <Paragraph position="5"> The current parser can handle the following examples with no special rules:  Marcus allowed several &amp;quot;function&amp;quot; words to be more than one part of speech. For example &amp;quot;have&amp;quot; could be an auxverb or a main verb, &amp;quot;that&amp;quot; could be a comp, determiner, or pronoun, and &amp;quot;to&amp;quot; could be a preposition or a auxverb. To handle these ambiguities, Marcus had a &amp;quot;Diagnostic rule&amp;quot; for each word. The diagnostic rules matched when the word it was to &amp;quot;diagnose&amp;quot; arrived in the first buffer, and used the 3 buffer look ahead to resolve the ambiguity.</Paragraph>
    <Paragraph position="6"> Each Diagnostic rule could ask questions concerning the grammatical features of the contents of the 3 buffers, as well as the partial item being built. As a result these rules were very complex and cumberson compared with the rest of the rules. But these rules seemed necessary to preserve the generality of the other rules.</Paragraph>
    <Paragraph position="7"> For example, the &amp;quot;HAVE-DIAG&amp;quot; decided if the sentence was a Yes-No-Question(YNQ) or an Imperative, and hence &amp;quot;have&amp;quot; a main verb or auxverb. The rule was as follows: \[have\]\[np\]\[verb\] -&gt; If 2nd is noun singular,n3p or 3rd is not +en then run Imperative.</Paragraph>
    <Paragraph position="8"> else run Yes-No-Question.</Paragraph>
    <Paragraph position="9"> and decided between: \[auxverb\]\[np\] -&gt; Yes-No-Question \[tnsless verb\] -&gt; Imperative at the start of the sentence.</Paragraph>
    <Paragraph position="10"> To alter the YNQ rule for the special case of &amp;quot;have&amp;quot;, would ruin the simple generality of the rule, and lose the linguistic generalization it captures.</Paragraph>
    <Paragraph position="11"> But the Marcus Parser assumed it would only be given grammatical sentences. If the Marcus parser was given an ungrammatical sentence, it might pass it as legal. For example the parser would pass as legal: \[11\] *Is the boys running \[12\] *Is the boy run? Notice they both match the YNQ pattern.</Paragraph>
    <Paragraph position="12"> Clearly for the rule YNQ to run, the auxverb must agree in number with the subject, and in affix with the verb. If we modify the YNQ rule to enforce this agreement, then only \[13\] will match the YNQ rule: \[13\] Have the boys taken the exam? \[14\] Have the boy taken the exam.</Paragraph>
    <Paragraph position="13"> \[15\] Have the boys take the exam.</Paragraph>
    <Paragraph position="14"> \[16\] ?Have the boy takenthe exam.</Paragraph>
    <Paragraph position="15"> In fact, if we enforce agreement on the YNQ rule, it will perform exactly the same as the old HAVE-DIAGNOSTIC, and the diagnostic is made redundant.</Paragraph>
    <Paragraph position="16"> Closer inspection of the diagnostics and the grammar rules they decide between, reveals that the grammar rules will in general pass ungrammatical sentences as legal. If these rules are then corrected, using agreement and grammaticallity, then all the diagnostics are made redundant and no longer needed.</Paragraph>
    <Paragraph position="17"> In order to handle part-of-speech ambiguity in a determinsitic way, the parser does not need special &amp;quot;Diagnostice rules&amp;quot;. If the grammar enforces agreement, and rejects ungrammatical strings then ambiguity handling happens automatically.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE THAT-DIAGNOSTIC
</SectionTitle>
    <Paragraph position="0"> The most complicated of all the diagnostics, was the THAT-DIAGNOSTIC. This rule decided if &amp;quot;that&amp;quot; was a determiner, pronoun, or a comp. In Marcus' parser, 3 rules were needed for this decision. Also, if Marcus' diagnostic decided that &amp;quot;that&amp;quot; was to be a determiner, then it would be attached after the nounphrase it would be a determiner for, was built! In Church \[1980\], the THAT-DIAGNOSTIC is only one rule, but extremely complicated. His deterministic parser can handle the widest range of &amp;quot;that&amp;quot; examples, but the diagnostic is seemingly the most complicated in the grammar.</Paragraph>
    <Paragraph position="1"> Following the above methodology though, the diagnostic can be made redundant. &amp;quot;that&amp;quot; can only be a determiner if the word following it  --351-will take a determiner. In Marcus' original parser, the rule DETERMINER made no check for grammaticallity, and would attempt to parse the following fragements:  If the rule DETERMINER is fixed to reject these examples, then the determiner usages will all work properly. Similary, the rule PRONOUN would pass ungrammatical strings, so this was altered. Finally, only the comp use of &amp;quot;that&amp;quot; are left, and the parser's normal rules can handle this case. By simply altering the above rules to reject ungrammatical strings, the following sentences can be parsed with no special diagnostic additions to the parser.:  that.</Paragraph>
    <Paragraph position="2"> that boy.</Paragraph>
    <Paragraph position="3"> that boy hit mary.</Paragraph>
    <Paragraph position="4"> that was nice.</Paragraph>
    <Paragraph position="5"> that that was nice.</Paragraph>
    <Paragraph position="6"> that he hit mary.</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
GARDEN PATHS
</SectionTitle>
    <Paragraph position="0"> After altering the grammar, so there were no special rules for ambiguity, the following sentences were still a problem: \[27\] What little fish eat is worms.</Paragraph>
    <Paragraph position="1"> \[28\] That deer ate everything in my garden surprised me.</Paragraph>
    <Paragraph position="2"> \[29\] The horse raced past the barn fell. &amp;quot; \[30\] The building blocks the sun faded were red.</Paragraph>
    <Paragraph position="3"> But for each of these, there is a partner sentences, showing these ae potential garden paths \[r~ilne 1980b\].</Paragraph>
    <Paragraph position="4"> \[31\] What little fish eat worms.</Paragraph>
    <Paragraph position="5"> \[32\] That deer ate everything in my garden.</Paragraph>
    <Paragraph position="6"> \[33\] The horse raced past the barn. \[34\] The buildin~ocks the sun.</Paragraph>
    <Paragraph position="7"> As Marcus stated in his thesis, a deterministic parser cannot handle correctly a garden path sentence. But people also fail on garden path sentences. Since deterministic parsing should model human performance, and not exceed it, it is acceptable for the parser to fail. Instead these potential garden path situations are resolved using semantic information \[Milne 1980b\].</Paragraph>
    <Paragraph position="8"> Enforcing number agreement fails when a word is morphologically ambiguous. This problem has not been examined yet.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FREE TEXT
</SectionTitle>
    <Paragraph position="0"> A simulation of these rules was conducted by hand on an article in TIME \[1978\] and the front page of the NEW YORK TIMES \[1978\]. The parser's rules disambigution was correct for 99% of the occurances that the grammar could cover.</Paragraph>
    <Paragraph position="1"> (some ambiguities are not yet handled).</Paragraph>
  </Section>
  <Section position="9" start_page="0" end_page="0" type="metho">
    <SectionTitle>
A POSSIBLE EXPLANATION
</SectionTitle>
    <Paragraph position="0"> At first glance, English looks extremely ambigous and the ambiguity very difficult to handle. But given the constraints of grammaticallity, most of the ambiguity disappears. For only one of the possible multiple choices will generally be grammatical. People do not seem aware of all the ambiguity in the sentences they process (excluding global ambiguity examples). This and the paper suggests that handling ambiguity causes no additional load on a parser, a very desirable and intuitively acceptable result. In other words, grammaticallity and LA handling are directly related.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML