File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/j94-4001_intro.xml
Size: 18,214 bytes
Last Modified: 2025-10-06 14:05:45
<?xml version="1.0" standalone="yes"?> <Paper uid="J94-4001"> <Title>A Syntactic Analysis Method of Long Japanese Sentences Based on the Detection of Conjunctive Structures</Title> <Section position="4" start_page="519" end_page="528" type="intro"> <SectionTitle> 5. Dependency Analysis of a Sentence and Supplementing for Ellipses </SectionTitle> <Paragraph position="0"> As described in the preceding sections, information about CSs can be used to reduce a sentence into a simpler form. Consequently, a dependency structure of an entire sentence can be obtained by applying relatively simple head-dependent rules to CSs and the sentence. Another serious problem regarding CSs, in addition to the ambiguity of scope, is the ellipses that may occur in the components of CSs. We recover the omitted components in the stage of dependency analysis. We will explain this process in the following.</Paragraph> <Section position="1" start_page="519" end_page="522" type="sub_section"> <SectionTitle> 5.1 Dependency Analysis </SectionTitle> <Paragraph position="0"> In this paper, the goal of the syntactic analysis is to transform a sentence into a dependency tree structure in which a dependent bunsetsu node is placed as a child node of its head bunsetsu node. In a Japanese sentence, because each bunsetsu depends on one of the bunsetsus to the right of it, a sentence can be transformed into a tree whose root node is the last bunsetsu in the sentence. This left-to-right head-dependent relation is characteristic of the sentential structure of Japanese, and the dependency analysis fits this very well.</Paragraph> <Paragraph position="1"> Computational Linguistics Volume 20, Number 4 First, each conjunct of the CSs is analyzed. If there are two or more CSs in a nested structure in a sentence (i.e., having parent-child relations), each CS is analyzed from the innermost CS in the order of nesting level. Then finally, the main sentential component is analyzed. Because the pre- and post-conjuncts have their own consistent structures and meanings, they are parsed independently into dependency trees. The root nodes of these trees are the KB and the EB (the last bunsetsu of each conjunct). 4 After analyzing a CS, a new node, called the CS node, is created that has two child nodes, KB and EB. The CS node inherits the property of the EB when it depends on a bunsetsu to the right of it, and it inherits the property of the KB and the EB when it governs a bunsetsu to the left of it. In the next level analysis (the term we give to the analysis of its parent CS or of the whole sentence if no parent CS exists), the CS node is handled as a symbol. This means that bunsetsus outside a CS can no longer depend on bunsetsus in it, except the KB and the EB. Even in the case of a CS that consists of more than two conjuncts, the same analysis takes place, except that the dependency tree of the CS is composed of more than two sub-trees into which each conjunct is parsed.</Paragraph> <Paragraph position="2"> Parsing a series of bunsetsus in a certain range (conjuncts of CSs, or a whole sentence after merging all the CSs into CS nodes) is performed in the following way. The head bunsetsu is determined from right to left for each bunsetsu in the range of bunsetsus to the right of it with a no-cross condition, s The type of bunsetsu as a head is classified into two types, NB and PB. 6 Whether a bunsetsu depends on NB or PB is determined by the conjugation of its IW or by the type of its AW. For example, an NB with a postposition &quot;NO&quot; can depend on an NB, and a conditional form of a PB (ending with &quot;BA&quot;) can depend on a PB. When a bunsetsu can depend on two or more bunsetsus in the range, its head is determined by the following heuristics: * In most cases a bunsetsu depends on its nearest head in Japanese.</Paragraph> <Paragraph position="3"> Therefore, a bunsetsu is regarded as depending on its nearest head except in following two cases.</Paragraph> <Paragraph position="4"> * Because the postposition &quot;WA&quot; marks the topic of a sentence, a bunsetsu accompanying it usually depends on the last predicate that is the main predicate in a sentence. Thus, such a bunsetsu is regarded as depending on the last head in the analysis range.</Paragraph> <Paragraph position="5"> * A comma in a sentence shows a separation of meaning, and the bunsetsu accompanying a comma usually depends on a bunsetsu farther away than the nearest one. Based on our observation we consider such a bunsetsu to depend on the second nearest head.</Paragraph> <Paragraph position="6"> These rules are rather simple, but they are still useful when applied to the reduced form of a sentence, as shown in the discussion of the experiments.</Paragraph> <Paragraph position="7"> We illustrate this process for the sentence in Figure 12. At first, the CS \[HYOUDAI(a title),\]-\[CHOSHA(an author),\]- \[SHUDAI-NADO-NO(such as a theme)\] is analyzed; because each conjunct consists of only one bunsetsu, the analysis results only in creating 4 In the case of incomplete conjunctive structures, such as in Table 1-vi, neither conjunct can be parsed into a dependency tree, as it contains no predicate that should become the root node of a dependencY tree. A way of dealing with this problem is described in Section 5.3.</Paragraph> <Paragraph position="8"> 5 In Japanese, head-dependent relations do not cross each other, that is, when Bi depends on Bj, Bk (k ~ i) cannot depend on bunsetsus from Bi+l to Bj_ 1.</Paragraph> <Paragraph position="9"> 6 NBs and PBs can govern other bunsetsus, but other types of bunsetsus, like &quot;HIJOUNI(very)&quot; and &quot;SUBETENO(all),&quot; cannot.</Paragraph> <Paragraph position="10"> An example of analyzing a long sentence into a dependency structure.</Paragraph> <Paragraph position="11"> a CS node and assigning each bunsetsu to it (Figure 12a: 'PARA' is the CS node, and the nodes accompanying '<P>' are the root nodes of the dependency trees for conjuncts). Next, the pre- and post-conjuncts \[HYOUDAI(a title),...SAI-HENSEI-SHI(be reorganized),\]- \[SAKUIN-NO(of an index)...KIROKU-SHITE-OKU(be recorded).\] are analyzed and transformed into dependency trees, and another CS node is created (Figure 12b). Finally, the whole sentence is analyzed, and its dependency tree is obtained.</Paragraph> </Section> <Section position="2" start_page="522" end_page="524" type="sub_section"> <SectionTitle> 5.2 Extension of Conjunctive Structures and Recovering Omitted Modifiers </SectionTitle> <Paragraph position="0"> Our method of detecting a CS cannot find where the pre-conjunct begins with complete certainty. For this reason, it is necessary to check whether some modifiers 7 (bunsetsus) to the left of the detected SB can be included in the CS in the stage of dependency analysis. This left-side extension is performed only on CSs containing PBs. This is because modifiers to the left of a CS containing no PB rarely depend on the pre-conjunct alone; usually they depend on the entire CS (this head-dependent relation is handled as the relation to the CS node in the next level analysis) or on a bunsetsu after the CS. When a CS contains PBs, the analysis of its pre-conjunct does not stop at the detected SB, but continues to the bunsetsus to the left of the SB as follows: If the bunsetsu depends on a certain bunsetsu apart from the KB in the pre-conjunct, the bunsetsu is regarded as a part of the CS, and the extension operation is continued (Figure 13). Otherwise the extension operation is stopped. The KB is excluded from the candidates for a head, because the head-dependent relation to the KB is handled as the relation to the CS node in the next level analysis.</Paragraph> <Paragraph position="1"> 7 &quot;Modifiers&quot; here mean case components of verbs, too.</Paragraph> <Paragraph position="2"> A modifier ellipsis.</Paragraph> <Paragraph position="3"> * However, if the bunsetsu accompanies the postposition &quot;WA&quot; or a comma, the bunsetsu is not included into the CS and the extension operation is stopped* This is because a bunsetsu of this kind causes a separation in a sentence and usually depends on the entire CS (that is, KB and EB) or another bunsetsu to the right of the CS, not a bunsetsu in the pre-conjunct.</Paragraph> <Paragraph position="4"> In the sentence in Figure 7, the bunsetsu &quot;SONO(the),&quot; which can depend on &quot;KAN-OUSEI-WO(possibility),&quot; is regarded as contained in the CS, but the bunsetsu &quot;KAI-SHOU-SURU-TAME-NI-WA(in order to solve),&quot; which accompanies &quot;WA&quot; and a comma, is not contained in the CS, and the extension of the CS thus ends here.</Paragraph> <Paragraph position="5"> Through this extension of the CS, the issue of omitted modifiers in a CS can be addressed* When the same modifiers exist in both conjuncts, the modifiers in its post-conjunct are often omitted (Figures 14a and 14b). Among these omitted modifiers, the ones that depend on the EB do not have to be recovered, because a remaining modifier that depends on the KB is treated as depending on the CS node, which means that the Of course, a n~ior part of the problem is to ascertain accurately what algorithm is necessary to check a certain phenomenctl, but the architecture of a computer is sometimes 4 help and sometimes au obstacle to its developmelaL Figure 15 An example of analyzing a long sentence into a dependency structure.</Paragraph> <Paragraph position="6"> remaining modifier also depends on the EB (Figure 14c). The problem is to recover the omitted modifiers that depend on a bunsetsu in the post-conjunct except the EB. The key point is that Y and Y~ in Figure 14b have a great similarity because they Computational Linguistics Volume 20, Number 4 contain not only similar bunsetsus, KB and EB, but also very similar bunsetsus that originally governed the same modifier X. Therefore, we can detect the possibility of modifier ellipsis by checking the similarity score of the CS obtained when detecting its scope. When the extension operation is performed on the pre-conjunct of a CS that is a strong CS, we recover the omitted modifiers by interpreting a bunsetsu that depends on a bunsetsu (Bi) in its pre-conjunct as also depending on the bunsetsu (By) in its post-conjunct corresponding to Bi (Figure 14d) (we think Bi corresponds to By when the path specifying these conjuncts contains an element a(i~j)). A CS that satisfies the following two conditions is called a strong CS: * The number of bunsetsus in its pre-conjunct (nl) and the number of bunsetsus in its post-conjunct (n2) are about the same, satisfying the equation (nl/1.3) < n2 < (nl x 1.3).</Paragraph> <Paragraph position="7"> * The score of the path specifying the CS is greater than (nl + n2) x 4.</Paragraph> <Paragraph position="8"> For example, in the sentence in Figure 15, the detected CS \[TASUKE-NI(a help)... ARE-BA(sometimes be),\]- \[SAMATAGE-NI(an obstacle)...ARU(sometimes be).\] satisfies the above two conditions. Thus, by checking the relation between the CS and the outside modifier phrase &quot;SONO KAIHATSU-NO(to its development)&quot; the phrase is considered to depend on both of the bunsetsus &quot;TASUKE-NI(a help)&quot; and &quot;SAMATAGE-NI(an obstacle).&quot; In the same way, &quot;COMPUTER-NO ARCHITECTURE-GA(the architecture of a computer)&quot; is again thought to depend on both the bunsetsu &quot;NARU(be)&quot; in the pre-conjunct and the bunsetsu &quot;NARU(be)&quot; in the post-conjunct. The dependency tree of this sentence that is supplemented correctly with the omitted modifiers is shown in Figure 15.</Paragraph> </Section> <Section position="3" start_page="524" end_page="528" type="sub_section"> <SectionTitle> 5.3 Handling of Analysis Failure and Recovering Omitted Predicates </SectionTitle> <Paragraph position="0"> Another type of ellipsis in CSs that is a serious problem is the omission of predicates in incomplete conjunctive structures. This type of ellipsis can be found by examining the failures of dependency analysis. The failure of dependency analysis here means that a head bunsetsu cannot be found for a certain bunsetsu in a certain range of analysis.</Paragraph> <Paragraph position="1"> When two predicates in a conjunctive predicative clause are the same, the first predicate is sometimes omitted and the remaining part constitutes the incomplete conjunctive structure (Figures 16a and 16b). In these structures, neither conjunct can be parsed into a dependency tree, because there is no predicate in it that should become the root node of a dependency tree. For this reason, by checking dependency analysis failures, we find incomplete conjunctive structures and start the process of supplementing the CSs with omitted predicates. The conditions for incomplete conjunctive structures are the following (Figure 16c): * Dependency analysis failure occurs both in the pre- and post-conjuncts.</Paragraph> <Paragraph position="2"> * The bunsetsus whose heads cannot be found (called FB) contain identical AWs.</Paragraph> <Paragraph position="3"> The key point is that it is important for successful analysis of CSs containing predicate ellipses to detect the correct scope of the incomplete conjunctive structures. In most cases their scopes can be detected correctly from a significant similarity between the * -. }iiiiiiiii}i i !ililiiiiii ,!i i l liiiiiiiiiii} ! iii!iiil} iii ! l (c) Conditions for an incomplete conjunctive structure.</Paragraph> <Paragraph position="4"> ...*..1o. .degdeg ................................................................................................... ~ &quot;''-. t - L!iii!iiiii!i !i !l Iiii!iiiii!i! i! J c--n * (d) An incorrect conjunctive noun phrase.</Paragraph> <Paragraph position="5"> (e) A dependency tree of an incomplete conjunctive structure.</Paragraph> <Paragraph position="6"> I,, 11 , t (f) Recove~'ing the omitted predicate.</Paragraph> <Paragraph position="7"> A predicate ellipsis.</Paragraph> <Paragraph position="8"> pre- and post-conjuncts that contain the case components of the same predicate* That is, the detection of a CS based on the similarity measure smoothly leads to the omitted predicate being recovered* A method that merely searches for the EB as the most similar bunsetsu for the KB might detect an incorrect scope, and in this case the predicate ellipsis cannot be detected, as shown in Figure 16d.</Paragraph> <Paragraph position="9"> When a CS is regarded as an incomplete conjunctive structure, each series of bunsetsus to the left of an FB is analyzed into a dependency tree, and its root node (FB) is connected to a CS node in addition to the KB and the EB (Figure 16e)* When the head of the CS node is found in the next level analysis, the head is considered to be the omitted predicate and the dependency tree is transformed by supplementing it with this predicate in the pre-conjunct, as shown in Figure 16f. When the postposition of SETSUZOKU-SARETE-IRU. C/P> --PARA (be connected) As is shown in the figure, the pnp transistor is used as current source and the npn transistor as switching, and the collector of the pnp transistor and the base of the npn transistor are connected to comn~n p layer. Figure 17 An example of analyzing a long sentence into a dependency structure.</Paragraph> <Paragraph position="10"> the KB is also omitted (in Figure 16b, p2 is omitted in the KB), the KB is supplemented with the postposition of the EB.</Paragraph> <Paragraph position="11"> For example, in the sentence in Figure 17, the CS \[DENRYU-GEN-NI(as current source) PNP-TRANSISTOR(the pnp transistor),\]- \[SWITCHING-NI(as switching) NPN-TRANSISTOR-WO(the npn transistor)\] is recognized as an incomplete conjunctive structure, since the head of the bunsetsu &quot;DENRYU-GEN-NI(as current source)&quot; in the pre-conjunct and the bunsetsu &quot;SWITCHING-NI(as switching)&quot; in the post-conjunct are not found, and both of them have the same postposition &quot;NI.&quot; As a result, FB &quot;DENRYU-GEN-NI(as current source)&quot; and FB &quot;SWITCHING-NI(as switching)&quot; are connected to the CS node in addition to the KB and EB. In the analysis of the parent CS, it is made clear that this CS node depends on bunsetsu &quot;SHIYOU-SHI(be used),&quot; and the dependency tree is transformed by supplementing it with the omitted predicate and the omitted postposition, as shown in Figure 17 (this sentence also contains a conjunc- null An example of redetecting a conjunctive structure under a failure of analyzing a dependency structure.</Paragraph> <Paragraph position="12"> tive noun phrase and a conjunctive predicative clause, and all of them are analyzed correctly).</Paragraph> <Paragraph position="13"> On the other hand, if the dependency analysis of a CS fails and the conditions for incomplete conjunctive structures are not satisfied, we postulate that the detected scope of a CS is incorrect and start the detection of a new CS for the KB. To find a new CS whose pre- and post-conjuncts can be analyzed successfully, the positions of the SB and EB are restricted as follows: SB: We examine head-dependent relations in a series of bunsetsus from the first bunsetsu in a sentence to the KB. If there exists a bunsetsu in that range whose head is not found, the analysis must fail for a CS whose pre-conjunct contains this bunsetsu. Therefore, the SB is restricted to be to the right of this bunsetsu.</Paragraph> <Paragraph position="14"> EB: We examine head-dependent relations in all series of bunsetsus that can be a post-conjunct. If the analysis of a certain series of bunsetsus fails, the last bunsetsu of this series cannot become an EB of a new CS.</Paragraph> <Paragraph position="15"> After reanalysis of the CS, the analysis returns to the reduction of a sentence by checking the relations between all pairs of CSs. An example of redetecting a CS is shown in Figure 18.</Paragraph> </Section> </Section> class="xml-element"></Paper>