File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/p95-1006_intro.xml
Size: 11,689 bytes
Last Modified: 2025-10-06 14:05:51
<?xml version="1.0" standalone="yes"?> <Paper uid="P95-1006"> <Title>Robust Parsing Based on Discourse Information: Completing partial parses of ill-formed sentences on the basis of discourse information</Title> <Section position="4" start_page="40" end_page="44" type="intro"> <SectionTitle> 3 Implementation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="40" end_page="44" type="sub_section"> <SectionTitle> 3.1 Algorithm </SectionTitle> <Paragraph position="0"> As we showed in the previous section, information that is very useful for obtaining correct parses of ill-formed sentences is provided by complete parses of other sentences in the same discourse in cases where a parser cannot construct a parse tree by using its grammar rules. In this section, we describe an algorithm for completing incomplete parses by using this information.</Paragraph> <Paragraph position="1"> The first step of the procedure is to extract fi'om an input text discourse information that the system can refer to in the next step in order to complete incomplete parses. The procedure for extracting discourse information is as follows: 1. Each sentence in the whole text given as a discourse is processed by a syntactic parser. Then, except for sentences with incomplete parses and multiple parses, the results of each parse are stored as discourse information. To be precise, the position and the part of speech of each instance of every lemma are stored along with the lemma's modifiee-modifier relationships with other content words extracted from As you can see, the help display provides additional information about the menu options ava/lable, as well as a list of related topics.</Paragraph> <Paragraph position="2"> the parse data. Table 3 shows an example of such information. In this table, CFRAMEuuuuuu indicates an instance of cursor in the discourse; information on the position and on the whole sentence can be extracted from each occurrence of CFRAME. In accumulating discourse information, a score of 1.0 is awarded for each definite modifiee-modifier relationship. A lower score, 0.1, is awarded for each ambiguous modifiee-modifier relationship, since such relationships are less reliable.</Paragraph> <Paragraph position="3"> 2. When all the sentences have been parsed, the discourse information is used to select the most preferable candidate for sentences with multiple possible parses, and the data of the selected parse are added to the discourse information.</Paragraph> <Paragraph position="4"> After all the sentences except the ill-formed sentences that caused incomplete parses have provided data for use as discourse information, the parse completion procedure begins.</Paragraph> <Paragraph position="5"> The initial data used in the completion procedure are a set of partial parses generated by a bottom-up parser as an incomplete parse tree. For example, the PEG parser generated three partial parses for sentence (2.1), consisting of &quot;As you can see,&quot; &quot;you can choose from many topics,&quot; and &quot;to find out what information is available about the AS/400 system,&quot; as shown in Figure 2. Since partial parses are generated by means of grammar rules in a parser, we decided to restructure each partial parse and unify them according to the discourse information, rather than construct the whole parse tree from discourse information.</Paragraph> <Paragraph position="6"> The completion procedure consists of two steps: Step 1: Inspecting each partial parse and restructuring it on the basis of the discourse information For each word in a partial parse, the part of speech and the rood,flee-modifier relationships with other words are inspected. If they are different from those appears in sentences 39, 140 , 145 , 160, 161 167 169... N to find \[ appears in sentences 236. PP V 1 appears in sentences 32. V PP (PN)</Paragraph> <Paragraph position="8"> for the POS AV AV of each word PN PP Phrases what information is available about the appears in sentences 49. repeated AJ N V AJ PP DET within the the AS/400 system.</Paragraph> <Paragraph position="9"> discourse appears in sentences 6, 109, 115. DET N N</Paragraph> <Paragraph position="11"> combining dependencies existing within phrases that occur in other sentences of the same chapter in the discourse information, the partial parse is restructured according to the discourse information.</Paragraph> <Paragraph position="12"> For example, Figure 5 shows an incomplete parse of the following sentence, which is the 43rd sentence in a technical text that consists of 175 sentences. 3 (3.1) Fig. 3 is an isometric view of the magazine taken from the operator's side with one cartridge shown in an unprocessed position and two cartridges shown in a processed position.</Paragraph> <Paragraph position="13"> In the second partial parse, the word &quot;side&quot; is analyzed as a verb. The same word appears fifteen times in the discourse information extracted from well-formed sentences, and is analyzed as a noun every time it appears in complete parses; furthermore, there are no data on the noun &quot;operator&quot; modifying the verb &quot;take&quot; through the preposition &quot;from,&quot; while there is information on the noun &quot;operator's&quot; modifying the noun &quot;side,&quot; as in sentence (3.2), and on the noun &quot;side&quot; modifying the verb &quot;take,&quot; as in sentence (3.3).</Paragraph> <Paragraph position="14"> (3.2) In the operation of the invention, an operator loads cartridges into the magazine from 3This structure resulting from an incomplete parse does not indicate that the grammar of the parser lacks a rule for handling a possessive case indicated by an apostrophe and an s. When the parser fails to generate a unified parse, it outputs partial parses in such a manner that fewer partial parses cover every word in the input sentence.</Paragraph> <Paragraph position="15"> the operator's side as seen in Figs. 3 and 12.</Paragraph> <Paragraph position="16"> (151st sentence) (3.3) Fig. 4 is an isometric view of the magazine taken from the machine side with one cartridge shown in the unprocessed position and two cartridges shown in the processed position. (44th sentence) Therefore, these two partial parses are restructured by changing the part of speech of the word &quot;side&quot; to noun, and the modifiee of the noun &quot;operator&quot; to the noun &quot;side,&quot; while at the same time changing the modifiee of the noun &quot;side&quot; to the verb &quot;take.&quot; As a result, a unifed parse is obtained, as shown in the discourse information If the partial parses are not unified into a single structure in the previous step, they are joined together on the basis of the discourse information until a unified parse is obtained.</Paragraph> <Paragraph position="17"> Partial parses are joined as follows: First, the possibility of joining the first two partiM parses is examined, then, either the unification of the first two parses or the second parse is examined to determine whether it can be joined to the third parse, then the examination moves to the next parse, and so on.</Paragraph> <Paragraph position="18"> Two partial parses are joined if the root (head node) of either parse tree can modify a node in the other parse without crossing the modification of other nodes.</Paragraph> <Paragraph position="19"> To examine the possibility of modification, discourse information is applied at three different levels. First, for a candidate modifier and modifiee, an identical pattern containing the modifier word and the modifiee word in the same part of speech and in the same relationship is searched for in the discourse information. Next, if there is no identical pattern, a modification pattern with a synonym (Collins, 1984) of the node on one side is searched for in the discourse information. Then, if this also fails, a modification pattern containing a word that has the same part of speech as the word on one side of the node is searched for.</Paragraph> <Paragraph position="20"> Since the discourse information consists of modification patterns extracted from complete parses, it reflects the grammar rules of the parser, and a matching pattern with a part of speech rather than an actual word on one side can be regarded as a relaxation rule, in the sense that syntactic and semantic constraints are less restrictive than the corresponding grammar rule in the parser.</Paragraph> <Paragraph position="21"> These matching conditions at different levels are applied in such a manner that partial parses are joined through the most preferable nodes.</Paragraph> </Section> <Section position="2" start_page="44" end_page="44" type="sub_section"> <SectionTitle> 3.2 Results </SectionTitle> <Paragraph position="0"> We have implemented this method on an English-to-Japanese machine translation system called Shalt2 (Takeda et al., 1992), and conducted experiments to evaluate the effectiveness of this method. Table 4 gives the result of our experiments on two technical documents of different kinds, one a patent document (text 1), and the other a computer manual (text 2). Since text 1 contained longer and more complex sentences thml text 2, our ESG parser failed to generate unified parses more often in text 1; on the other hand, the frequency of morphologically identical words and collocation patterns was higher in text 1, and our method was more effective in text 1. In both texts, the discourse information provided enough information to unify partial parses of an incomplete parse in more than half of the cases. However, the resulting unified parses were not always correct. Since sentences with incomplete parses are usually quite long and contain complicated structures, it is hard to obtain a perfect analysis for those sentences. Thus, in order to evaluate the improvement in the output translation rather than the improvement in the rate of success in syntactic analysis, in which only perfect analyses are counted, we compared output translations generated with and without the application of our method. When our method was not applied, partial parses of an incomplete parse were joined by means of some heuristic rules such as the one that joins a partial parse with &quot;NP&quot; ill its root node to a partial parse with &quot;VP&quot; in its root node, and the root node of the second partial parse was joined to the last node of the first partial parse by default. When the discourse information did not provide enough information to unify partial parses with the application of our method, the heuristic rules were applied. In such cases the default rule of joining the root node of the second partial parse to the last node of the first partial parse was mostly applied, since the least restrictive matching patterns in our method were similar to the heuristic rules. Thus, the system generated a unified parse for each sentence regardless of the discourse information, and we compared the output translations generated with and without the application of our method. The results are shown in ing how well the output Japanese sentence conveyed the meaning of the input English sentence. Since most unified parses contained various errors, such as incorrect modification patterns and incorrect parts of speech assigned to some words, fewer errors generally resulted in better translations, but incorrect parts of speech resulted in worse translations.</Paragraph> </Section> </Section> class="xml-element"></Paper>