File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/a94-1007_metho.xml
Size: 17,662 bytes
Last Modified: 2025-10-06 14:13:36
<?xml version="1.0" standalone="yes"?> <Paper uid="A94-1007"> <Title>Symmetric Pattern Matching Analysis for English Coordinate Structures</Title> <Section position="3" start_page="41" end_page="42" type="metho"> <SectionTitle> 3 Parallelism of the conjunctions </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="41" end_page="42" type="sub_section"> <SectionTitle> 3.1 Symmetric patterns of parallelism </SectionTitle> <Paragraph position="0"> English coordinate conjunctions have a tendency to conjoin the same kinds of syntactic patterns. We identify three levels of symmetric pattern. * Phrase(Clause) symmetric patterns: Phrase(Clause)-level symmetric patters such as \[Verb-Phase AND Verb-Phase\] in the conjunct scope of &quot;as well as &quot; in (2) and a series of commas in (3).</Paragraph> <Paragraph position="1"> (2) Such coupling is desirable because it enables a development engineer to move easily within this hierarchy as well as to exploit the distinctive features of each system.</Paragraph> <Paragraph position="2"> (3) The add operators cause POSTSCRIPT to pick up the top two numbers from the stack, remove them, add them, and leave the sum on the stack.</Paragraph> <Paragraph position="3"> * Word symmetric patterns: Word-level symmetric patterns such as \[Quantifier Preposition Abstract-Noun AND Quantifier Preposition Abstract-Noun\] in the conjunct scope of the &quot;and&quot; in (4). Some patterns are represented by the semantic features such as \[Instrument AND Instrument\] about &quot;and1&quot; of (5). (4) The container need not be large; if it is lOcm in diameter and 12cm in depth, that is enough.</Paragraph> <Paragraph position="4"> (5) Inspect the cockpit indicators and1 levers for cracked glass and missing control knobs.</Paragraph> <Paragraph position="5"> * Morphological symmetric patterns: Morphological symmetric patterns are recognized by the sorts of characters, uppercase or lowercase letters, as in (6) and (7) as well as an exactly same morphological pattern \[CIC ... hatches AND CIC ..., hatches\] in (8).</Paragraph> <Paragraph position="6"> (6) An atomic bomb is a device for producing an explosively rapid neutron chain reaction in uranium-235 or plutonium-239 which is called a fissile material.</Paragraph> <Paragraph position="7"> (7) Technical orders described in AFR 8-2 and PFR 7-2 are registered in the on-line file in the form of inspection workcards.</Paragraph> <Paragraph position="8"> (8) There are CIC1 ditching2 hatches3 and CIC4 escape5 hatches6 in the compartment.</Paragraph> <Paragraph position="9"> Some symmetric patterns may appear in combined form \[Preposition Gerund Nominal-Phrase AND Preposition Gerund Nominal-Phrase\] in (9). (9) Radioisotopes have played an important part in1 developing effective insecticides in2 the country and in3 finding the best ways oPS applying them.</Paragraph> </Section> <Section position="2" start_page="42" end_page="42" type="sub_section"> <SectionTitle> 8.2 Analysis by {he symmetric patterns </SectionTitle> <Paragraph position="0"> The symmetric patterns can be effective information for top-down analysis of the conjunct scope. For example, though (9) allows another counterpart (&quot;in2 the country&quot;) as the conjoined phrase (&quot;in3 finding ...&quot;), the symmetric pattern information makes it easy to to select the correct counterpart of the phrase (&quot;in1 developing .... &quot; ). Also, where other examples syntactically allow other counterparts of the conjoined noun phrases, the symmetric pattern information enables easy selection. Often, the scope of each conjunct is explicitly demarcated with commas and morphological patterns, as in (3) and (8). The symmetric patterns can be also effective for word disambiguation. For example, (8) contains verb/noun ambiguities for escape5 and hatches6. The symmetric patterning of ditching2 hatches3 facilitates their disambiguation.</Paragraph> <Paragraph position="1"> In the above example sentences, the words immediately following the conjunction play important roles for detecting the structures, because there is usually strong similarity between the starting words of each conjunct scope and the words following the conjunctions. However, the following examples also contain kinds of symmetric patterns, though the words following the conjunctions don't have similarity with the starting words of the conjunct scopes. (10) In 1985 the government offered offshore registration to the companies, and, in consequence, in 1985 incorporation fees generated about two million dollars.</Paragraph> <Paragraph position="2"> (11) The damage of the landing gear selector valve caused the leakage of the hydraulic fluid, and completely blockaded the return path.</Paragraph> <Paragraph position="3"> (12) Close the cockpit ditching hatches, and the cabin pressure will be dumped to relieve the air loads on the hatches.</Paragraph> <Paragraph position="4"> In both (10) and (11), an adverbial modifier is inserted at the start of the second conjunct; this is a common pattern extension. In (12), there is no real parallelism: the first conjunct clause is an imperative, and the second its result.</Paragraph> </Section> </Section> <Section position="4" start_page="42" end_page="44" type="metho"> <SectionTitle> 4 Balance matching analysis model </SectionTitle> <Paragraph position="0"> The balance matching analysis model determines the correct structure by taking advantage of the symmetric patterns. In this section, first, the representation of symmetric patterns is presented. Then the balance matching operation is presented. Finally, analysis by the balance matching is described.</Paragraph> <Section position="1" start_page="42" end_page="42" type="sub_section"> <SectionTitle> 4.1 The pattern representation </SectionTitle> <Paragraph position="0"> The symmetric patterns are represented by a list of three feature sets; Phrase features, Word features, and Morphological features, based on the symmetric pattern levels.</Paragraph> <Paragraph position="1"> * Phrase feature \[C/\]: Values are Predicative, Nominal, Nominal-Premodifier, Nominal-Postmodifier, Predicate-modifier. These values are assigned to all the constituents in the phrase. For example, all the words in &quot;the effective insecticides&quot; have C/.(Nominal) * Word feature \[7\]: This feature includes 120 values which subclasses of general parts of speech, according to their grammatical and semantic function. For example, some values are {NounInstrument}, {NounHuman}, {NounAction}, {PredicateStatic}, etc.</Paragraph> <Paragraph position="2"> * Morphological feature \[6\]: The values shows the morphological attributes of the words, which are a pair of the word and the morphological type. For example, &quot;uranium235&quot; is represented by &(uranium-235, alphabet_hyphen_arabic-n umbers) .</Paragraph> <Paragraph position="3"> Each word in the sentence is represented by the set of the three features. C/ and 7 can include ambiguous values. The ith word-feature set Hi and the n-word sentence S~ are respectively represented as follows. ( ~i ) ~i -= {C/iI,''',C/im}</Paragraph> <Paragraph position="5"> When the conjunction is the ruth word of the n-word sentence, the left-side list S~ -1 and the right-side list S~n+l are respectively represented as follows.</Paragraph> <Paragraph position="7"> The goal of the balance matching is to find the most symmetric pair of 5~-1(I _< x < m) and ,$Ym+i(m < y ~ n), i.e., to find the values of x and y.</Paragraph> </Section> <Section position="2" start_page="42" end_page="43" type="sub_section"> <SectionTitle> 4.2 The balance matching operation </SectionTitle> <Paragraph position="0"> By definition, the most symmetric pair shares the maximum number of the word-feature sets in the lists. The pair is detected by three operations: the intersection operation for two features, the matching operation for two word-feature sets, and the balancing operation for two lists.</Paragraph> <Paragraph position="1"> The intersection operation is one of the normal set operations for the features:</Paragraph> <Paragraph position="3"> The mutual dependency information among the C/ij and Vik is managed by bi-directional lists in the background. If all C/ij dependent of Vik are disambiguated by the operation, 7ik are removed.</Paragraph> <Paragraph position="4"> The matching operation N for the the word-feature sets V, 142 is defined as follows:</Paragraph> <Paragraph position="6"> of the conjoined sentence, the word-feature sets of Win+l, which immediately follows the conjunction, play an important role for detecting the structure, because there is strong similarity between the starting word-feature set of the conjunct scope and 142m+1.</Paragraph> <Paragraph position="7"> The balancing operation (r) for the lists PS~ ,T~' is defined as follows:</Paragraph> <Paragraph position="9"> Every word-feature set in the list doesn't always match one of the other list. Some word-feature sets in PS\]' can find matching counterparts in 7PS T but others cannot. In (18), The: and operationsa respectively matches the1 and results2.</Paragraph> <Paragraph position="10"> (18) The1 arithmetics operations3 and the~ results2.</Paragraph> <Paragraph position="11"> Therefore, the balancing operation creates a set of the lists for exhausting all the possible combinations. The lists consist of the matching word-feature sets (12~I,Y~) which are selected to avoid crossing any existing lines when ~i and I4~ are connected by a line as in the following:</Paragraph> <Paragraph position="13"> The balancing degree 8 for a list 2- is defined as fob lOWS:</Paragraph> <Paragraph position="15"> dV'(C/) and Af(F) respectively represent the total number of the each feature in the list 2-. Af(A) is the total number of the values of A in the list 2&quot;. Herein, wk is defined as a binary value (1.0 or 0.0) for simplifying the model. Through the analysis of 10,000 English conjunctive sentences in technical manuals, the structures could be divided into about 300 coordinate patterns, which are represented in the form of the word-feature sets. We manually assigned the weights to the features A, F and * according to the patterns, in order to select the correct structure.</Paragraph> </Section> <Section position="3" start_page="43" end_page="44" type="sub_section"> <SectionTitle> 4.3 Analysis by balance matching </SectionTitle> <Paragraph position="0"> The n-word sentence including the conjunction as the ruth word is analyzed according to the following steps.</Paragraph> <Paragraph position="1"> L Collect the word-feature set G(W): G(W) = {Wx\[Wx n Wm+l # NULL} This step collects a set of the words similar to W,~+I. The collection considers some definite concord markers and boundary markers, such as &quot;and&quot;, &quot;both&quot; , &quot;either&quot;, commas, periods, and colons. In order to deal with the cases of (10),(11) and (12), the starting words of the clause I42 k are added to G(IA~) when a comma II.</Paragraph> <Paragraph position="2"> is preceded by Create the list H(PS) = {Wjl i the conjunction.</Paragraph> <Paragraph position="3"> set H(PS): _<j <m-l, Wi6G0d))} n--m. III. Create the list ~1 * ~,.~'-&quot;' = {W,.+:, W~.+2,&quot;., W.} IV. To create the list set F(PS7~) by the balancing operation.</Paragraph> <Paragraph position="5"> V. Select the list F(PS7~)ma~ which has the highest balancing degree from all the possible lists in F(PS7~).</Paragraph> <Paragraph position="6"> For example, (9) are analyzed by the following steps. Here, for the easy understanding, the word-feature set and the list are represented by the simplified expression.</Paragraph> <Paragraph position="7"> (9) Radioisotopes have played an important part in1 developing effective insecticides in2 the country and in3 t~nding the best ways of applying them* I. G(W) = {in:, in2} II. H(PS) = {(in1, developing, effective, insecticides, in, the, country), (in2, the, country)} III. Tt~ = (inn, finding, the, best, ways, of, applying, them) IV. F(CTt) = {..., (Preposition, Nominal), (Preposition, Gerund, Nominal)} V.</Paragraph> <Paragraph position="8"> F( E~ ) ..... = (Preposition, Gerund, Nominal) = (in1, developing, effective,..., in, the, country) 7&quot;4 = (in3, finding, the, best, ways, of, applying, them) The and5 in (21) can be analyzed as the same way. (21) Use extreme care when using cleaning solvents like1 acetone and methyl ethyl ketone which are highly flammable2 and shall be used3 in areas with4 adequate fire extinguishing devices and5 free6 of ignition sources.</Paragraph> <Paragraph position="10"/> </Section> </Section> <Section position="5" start_page="44" end_page="44" type="metho"> <SectionTitle> 5 Empirical results on the MT </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="44" end_page="44" type="sub_section"> <SectionTitle> system 5.1 MT configuration </SectionTitle> <Paragraph position="0"> The model was incorporated as a balance matching module into the PIVOT English-Japanese MT system as shown in Figure.l(Muraki, 1986; Okumura et al., 1987; Okumura et al., 1991). The system works for a practical use together with the dictionary of 100,000 words.</Paragraph> <Paragraph position="1"> The input sentences, represented by the list of the feature sets based on the results of the morphological analysis, are transferred to the balance matching module. The module produces the conjunct scope information as well as the results of the balance matching. The syntactic-semantic analysis module analyzes the sentences according to this top-down information. After the analysis, the conceptual analysis module creates an interlingua representation. From the interlingua, output sentences are generated(Muraki, 1986; Okumura et al., 1991).</Paragraph> </Section> <Section position="2" start_page="44" end_page="44" type="sub_section"> <SectionTitle> 5.2 Effects of the model </SectionTitle> <Paragraph position="0"> When 15,000 conjunctive sentences in the technical reports and manuals were translated apart from the analyzed 10,000 sentences, the following effects are confirmed about the model.</Paragraph> <Paragraph position="1"> 1. Reduction of the analysis cost: This model provides the correct top-down information about the conjunct scopes for about 75~0 of the sentences, which result in the accurate and effective syntactic analysis. Most of the sentences had required backtracking for the analysis without the model. The backtracking is almost all suppressed by the model.</Paragraph> <Paragraph position="2"> 2. Improvement of the word disambiguation: The results of the balance matching improve word disambiguation and the inferences of the unknown words, because the ambiguities of each word are intersected by the counterparts of the symmetric list. The results provide top-down information for the analysis of the ambiguous words and unknown words, as in (8). Most of the sentences contained some word ambiguities and one sixth contained unknown words. By using the top-down information, the accuracy was twice better improved.</Paragraph> <Paragraph position="3"> .</Paragraph> <Paragraph position="4"> .</Paragraph> <Paragraph position="5"> Interpretation of the ellipses: The model makes it easy to interpret the elided elements, because the balance matching results can suggest the missing elements. In sentences such as (1), T~ is completely included PS:. The differences of PS: and 7~ complement the missing elements.</Paragraph> <Paragraph position="6"> Robust analysis: The model helps make the system robust because the balance matching operation is based on the three different kinds of features. In the MT domain of technical reports and manuals, there are some unknown words as well as some ambiguous words as in (6),(7) and (8). Robustness is achieved because the morphological features are considered as well as the other features.</Paragraph> </Section> <Section position="3" start_page="44" end_page="44" type="sub_section"> <SectionTitle> 5.3 Discussions </SectionTitle> <Paragraph position="0"> To increase the accuracy, the model is improved from three points: * Lexical disambiguation: The model is based on lexical information. Therefore, when many words provide too many feature ambiguities, the model cannot always determine a correct structure. In order to solve this problem, filtering rules are applied to the sentence before the balance matching operation. The rules are local constraint rules, which checks the two or three words before a focused word to remove some ambiguities of the focused word. The filtering rules improve the model.</Paragraph> <Paragraph position="1"> * Weight optimization: The weights for each feature set are manually assigned based on 300 patterns. They should be more appropriately assigned as real values instead of binary values according to the domain and text styles. We have developed a learning method for the feature structures(Okumura et al., 1992). The method is applicable for determining the weights according to the input patterns.</Paragraph> <Paragraph position="2"> * Semantic calculation: Some conjunctive structures should be analyzed by the more subdivided semantic features and semantic similarity calculation. We are introducing some semantic taxonomy and the semantic distance measurement algorithm(Knight, 1993; Okumura and Hovy, 1994; Resnik, 1993).</Paragraph> </Section> </Section> class="xml-element"></Paper>