File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1053_metho.xml

Size: 19,125 bytes

Last Modified: 2025-10-06 14:14:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1053">
  <Title>Lexical Information for Determining Japanese Unbounded Dependency</Title>
  <Section position="4" start_page="310" end_page="312" type="metho">
    <SectionTitle>
2 Lexical Discourse Grammar
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="310" end_page="310" type="sub_section">
      <SectionTitle>
2.1 Levels of Conjunctive Particles in
Japanese
</SectionTitle>
      <Paragraph position="0"> \]Jr ,Jap;~n(;se eOlnl)lex or compound SelLtellCes~ Sll})()rdinate clauses have several dependency levels relative to the main clause. Conjunctive particles, which are located at dm end of clauses and which link them, are classified according to the elemeni;s that the clause can contain, or to the correh~tion between clauses.</Paragraph>
      <Paragraph position="1"> Se, e tl~e fbllowing examples with conjunctive pardcles &amp;quot;,,ode&amp;quot;((c) &amp;quot;C/) ~t,,(t &amp;quot;nagara&amp;quot;( ~,~ 7&gt;&amp;quot; 6) (* is added to meaningless sentences).</Paragraph>
      <Paragraph position="3"> a(mSa').</Paragraph>
      <Paragraph position="4"> He returned while she was talking.</Paragraph>
      <Paragraph position="6"> I answered while i was smiling as lie asked.</Paragraph>
      <Paragraph position="7"> A clatuse with the conjunctive p~Lrticle to express ~reason' &amp;quot;node&amp;quot;too'el can contaii, ~ subjective noun phrase and an auxiliary verb of past tense &amp;quot;t~d'(7:), while a clanse with the particle indicating attendant action %agara&amp;quot;('&amp;~/d 6) cannot, as shown in 1) 3).</Paragraph>
      <Paragraph position="8"> Sentence 4) in comparison with 5) shows that a clause with &amp;quot;node&amp;quot;((/)(?) can subordinate at clause, with &amp;quot;na g~md'()7')~6), but the reverse is impossible. In these two selt~,011(;eS, brackets { } show subordinaU: clauses.</Paragraph>
      <Paragraph position="9"> Consequently, &amp;quot;nagara&amp;quot;()'/2 6) is ranked at a lower h,vel than &amp;quot;nodr&amp;quot;( (O -0&amp;quot;).</Paragraph>
      <Paragraph position="10"> in I,DG, conjunction levels of clauses are divided into six classiiic~tions according to the elements the clause clm contain, a,s listed in Table 1. These levels construct ~t hierarchy, i.e., at lower level clause caltlctofi subordinate a higher level one. The levels also represent the encapsulating powers of each Japanese. fltIlCtion words located at tile end of the clauses. Besides em0unctiw' partMes, Japanese conjunction nouns (u: relative nouns are also classiiied avd assigned it level.</Paragraph>
      <Paragraph position="11"> ltere, Japanese conjunction ILtltllLS, such as &amp;quot;toki&amp;quot; (lt~/: : :when) are nouns that can often be nsed just like COL&gt; juncdve l~articles when alley are attached at the end of clause. Japanese relative notln8, such as &amp;quot;lnae&amp;quot; (\]ii'j ::before) are another type of nouns that play roles simil;~r to those of conjunctions in English when they are moditied by predicative phrases or clauses.</Paragraph>
    </Section>
    <Section position="2" start_page="310" end_page="312" type="sub_section">
      <SectionTitle>
2.2 Modality in Conjunctive Particles
and Modification Preferences
</SectionTitle>
      <Paragraph position="0"> The conjunction lewis we introduced above reduce the syntactic ambiguities of long sentences. However, in order to select the most reliable struchtre of sentences, we use another important discourse feature tile conjunctive pt~rticles have, i.e., modMity.</Paragraph>
      <Paragraph position="1"> LDG assumes Japanese function words have moda\]ity or ~proposidonal attitudes' ~md suggest global ,~tructures of Japanese hmg sentences in cooperation with modality within ~mxiliary verbs. Wc mssllIne du~t the same kind of modality in a conjunctive particle anti  &amp;quot;k~ra&amp;quot;(~ 6 =because), &amp;quot;node&amp;quot;(69 ~ =because), &amp;quot;keredomo&amp;quot;( ~:~ ~ E&amp;quot; % =but) &amp;quot;nara&amp;quot;( t~C/ 6 =if), &amp;quot;hoka&amp;quot;(~N =besides), &amp;quot;to&amp;quot;( a =when), &amp;quot;baai&amp;quot;(t~@ =in case of), &amp;quot;toki&amp;quot;(&amp;quot;# =when) &amp;quot;mae&amp;quot;(~ =before), &amp;quot;ato&amp;quot;(~ a =after) cannot contain &amp;quot;totomoni&amp;quot;( ~ &amp; ~ K- =as), &amp;quot;tame&amp;quot;(PS: ~5 =because) tense expressions &amp;quot;todoujini&amp;quot;( ~ ~1~# ~- =at the same time) cannot contain &amp;quot;nagara&amp;quot;( re ~&amp;quot; 6 =while), &amp;quot;tsu ts u&amp;quot;(o o =while), particle &amp;quot;ga&amp;quot;(;O ~') &amp;quot;kotonaku&amp;quot;( t ~ tC/ &lt; =without)</Paragraph>
      <Paragraph position="3"> a predicate (or an auxiliary verb) correspond to each other. From the parsing viewpoint, this suggests that each conjunctive particle has modification preference with certain predicates or auxiliary verbs.</Paragraph>
      <Paragraph position="4"> From the viewpoint of modality, there are four predicate types in Japanese; (1) Auxiliary verbs of the first-type modality (conjecture etc.), (2) Auxiliary verbs of the second-type modality (necessity etc.), (3) Copula, and (4) Plain (present and past tense) forms of Verbs.</Paragraph>
      <Paragraph position="5"> Here, first-type modality includes conjecture, such as &amp;quot;darou&amp;quot;(f2&amp;quot;7~ &amp;quot;)) which corresponds to 'may,' 'can,' 'maybe,' and 'possibly' in English auxiliary verbs, adverbs, and adjectives. Second-type modality includes necessity, such as &amp;quot;nakereba-naranai&amp;quot; (re t~ fc t:ft3 6 toC/ u,) and &amp;quot;ta-hou-ga-yoi&amp;quot;(t: t! &amp;quot;) ~3: u~) which correspond to 'have to' or 'must,' and 'had better,' 'should,' or 'preferably' in English. The Japanese Copula &amp;quot;da&amp;quot; (?Z) or &amp;quot;desu&amp;quot; (~C/'Y) means definition or speaker's judgment with confidence. Phdn forms of verbs are the present or past tense forms of verbs without any modal auxiliary verbs. These forms do not have any modal morpheme, but when they which appear at the end of the sentence and are followed by a period they CAN convey modality, that is, attitudes or intentions of the subject or speaker. Plain forms of verbs in a relative clause which modify a nominal phrase do not have such modMity.</Paragraph>
      <Paragraph position="6"> LDG assumes that each conjunctive particle has a preference in modifying predicates or auxiliary verbs with consistent modMity. There are six levels of modality in conjunctive particles, and there are four types of modality in predicates, as mentioned above. A subordinate clause with modality modifies a consistent modality predicate type. The following figure illustrates the modality consistency between particles and  when, if) for example. This word corresponds to either the English conjunction 'when' with neutral reading or 'if' with conjecture modality. When the word &amp;quot;toki&amp;quot; is used as the 'if' reading, this word modifies a clause in which the modality is expressed. In most cases, auxiliary verbs such as &amp;quot;darou&amp;quot; (?Z,5 &amp;quot;) = may, maybe) or &amp;quot;ta-hou-ga-yoi&amp;quot;(t~ It 5 7~3: ~ = had better, should, preferably) express the modality of the modifee clause. The Japanese language has some words that indicate or emphasize the fact that the word &amp;quot;toki&amp;quot; is being used as the &amp;quot;if&amp;quot; reading. One of them is the adverb &amp;quot;moshi&amp;quot; (~o L) that indicates a supposition reading is applicable. This adverb is never used by itself and always modifies conjunctive forms such as &amp;quot;toki,&amp;quot; &amp;quot;nara,&amp;quot; &amp;quot;to,&amp;quot; and so on, and selects or emphasizes the supposition reading of the conjunctive forms. Another such word is the particle &amp;quot;wPS' (~:t) , which is usually used as a topic marker for a sentence. When &amp;quot;wa&amp;quot; is attached to &amp;quot;toki,&amp;quot; that is, in the form of &amp;quot;toki-wa,&amp;quot; the supposition reading is enhanced. This tendency is strengthened by the use of comma after the phrase &amp;quot;toki-wa.&amp;quot; The phrase &amp;quot;toki-wPS' tends to be used to modify phrases with auxiliary verbs of modality.</Paragraph>
      <Paragraph position="7"> When this phrase with modality modifies a plain form of a verb with a period at the end of the sentence, the readers recognize that the plain form of the verb contains a kind of modality, such as the subject's or speaker's intention. In other words, modality information of the subordinate clauses is attached to the plain form of the main verb. The following figure illustrates this interpreting mechanism.</Paragraph>
      <Paragraph position="8">  In contrast, when a subordinate clause does not have modality explicitly and modifies a clause with modality, the readers interpret the subordinate clause as that with a kind of modality such as conjecture. The following figure illustrates this situation.</Paragraph>
      <Paragraph position="9">  The modality coincidence described in this section is the base for analyzing Japanese long sentences. The .Japanese language has few syntactic indicators to show the segments of sentences, but is rich in semantic indicators which suggest sentence structure. The sem,~ntic indicators are the modalities that a wide range of parts of speech have. Conjunctive particles, adverbs, and even plain forms of verbs can have modMity in the Japanese btnguage. The modality structure is the key to comprehending Japanese long sentences.</Paragraph>
    </Section>
    <Section position="3" start_page="312" end_page="312" type="sub_section">
      <SectionTitle>
2.3 Japanese Sentence Structure Pre-
</SectionTitle>
      <Paragraph position="0"> sumption We assume that the modality structure (:an mMnly be detected by lexical intbrmation. Based on this assumption, LDG presumes the sentence structure before syntactic and semantic analyses on the ba~sis of previously collected lexical information that characterizes the lexical discourse.</Paragraph>
      <Paragraph position="1"> Figure 1 shows the configuration of our Japanese long sentence analyzer based on LDG. Input sentences are first analyzed morphologically. The part 'Discourse Structure Analysis' in Fig. 1 then presumes the sentence structure, before syntactic and semantic analysis. ttere, 'Discourse' means an inner-sentence congruence in Japanese long sentences thttt contain two or more predicates.</Paragraph>
      <Paragraph position="2"> In order to reduce the huge number of syntactic structures of Japanese hmg sentences and give priorities to each possible structure, the analyzing method based on LDG uses global modality structure focusing on lexic~tl information.</Paragraph>
      <Paragraph position="3"> First, the Discourse Structure Reference module reduces the number of possible syntactic structures, using the level of conjunctive particles described in the previous section. After that, the Discourse Structure Assumption module gives priorities to each possible syntactic structure, using the modification prefl;rence based on modality.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="312" end_page="314" type="metho">
    <SectionTitle>
3 An Application of LDG
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="312" end_page="313" type="sub_section">
      <SectionTitle>
3.1 Pause Control with LDG
</SectionTitle>
      <Paragraph position="0"> The level of conjunctive particles, which indicates the structure of the Japanese long sentences, is the most important feature of LDG. In this section we apply the level to another linguistic phenomenml in order to ('onfirm the validity of this model.</Paragraph>
      <Paragraph position="1"> The sentence structure influences a wide range of linguistic phenomena. One example is prosodic information (Dorffner et al., 1990; Iwata et al., 1990; Kaiki et al., 1990; Sakai et al., 1990). If the correct sentence structure is acquired for each input sentence, prosodic information can be accurately calculated. As yet, even the most up-to-date, advanced systems have not achieved the analysis in the deep structure, therefore sentence structure presumption in the surface structure is essentiM for a robust prosodic control system. LDG meets this requirement since it presumes the sentence structure by means of function words occurring on the surface (Doi, Kamei, et al., 1993). Hereafter, we propose the prosodic control system based on LDG.</Paragraph>
      <Paragraph position="2"> The presumption function ibr sentence structure (lexical discourse) by LDG is applied to pre-processing  ahead of speech synthesis, in a text-to-speech system.</Paragraph>
      <Paragraph position="3"> It can presume the global sentence structure through lexical information without any analysis in the deep structm'e. It is also possible to consider the pause length inserted after each clause in relation to the lex~ ical information in LDG. In other words, pauses are more fl'equently inserted after the clause of the higher conjunction levels than those of the lower levels. Corn sequently, the pause length and location can be more efficiently controlled with the LDG conjunction levels.</Paragraph>
      <Paragraph position="4"> To develop a text-to-speech conversion system with LDG, it is necessary to prepare the LDG conjunction level information of a large nmnber of conjunct equivalents such as conjunctive particles. Statistical data should also be collected fi'om human speech and reading, in regard to the correlations between pause length and the LDG conjunction levels. This substantial data is added to the lexical information to be used for speech synthesis in cooperation with pronunciation and accent. null</Paragraph>
    </Section>
    <Section position="2" start_page="313" end_page="314" type="sub_section">
      <SectionTitle>
3.2 Data Analysis
</SectionTitle>
      <Paragraph position="0"> To confirm the correlation between the conjunction level and the pause length, we have analyzed speech data spoken by a professional news announcer (male), reading newspapers and magazines at a regular speed.</Paragraph>
      <Paragraph position="1"> Wc extracted conjunctive particles and verbs, auxiliary verbs and adjectives in adverbial form from the speech data, and classified these words by the LDG conjunction level. The average pause length for each level was calculated for two separate cases; words preceding a  For words without a cormna (marked with white bars in Fig. 2), the result shows that the higher the conjunction level is, the longer the average pause length is (except for LEV.0, which is a particle for quotation).</Paragraph>
      <Paragraph position="2"> This tendency basically does not depend on whether or not a comma exists after the words. However, for words with a comma (marked with black bars in Fig. 2) pause length of Lev. 3 is shorter than that of Lev. 4.</Paragraph>
      <Paragraph position="3"> We suppose that the reason for this phenomenon is that a comma adds modality to the words and lengthens the pauses, as described in the previous section. Taking the comma effect into consideration, we can conclude that there is a solid correlation between the LDG conjunction level and the pause length.</Paragraph>
      <Paragraph position="4"> LEV.0 (&amp;quot;to&amp;quot;(&amp;) and &amp;quot;tte&amp;quot;(o'C)functioning in a similar way as quotation marks), however, requires a careful observation. This conjunction level, the highest rank, can contain every element, even an independent sentence. In this case, the relation between the conjunctive particle and its preceding clause is so weak that a pause tends to be inserted BEFORE the conjunctive particle, not after it. Therefore, in the present data, a pause was inserted after the particle in only one  In Table 1, no level is assigned to two of tile most  fl'equent groups: verbs and auxiliary verbs in adverbial tbrm, and verbs and auxiliary verbs in tile same form with a conjunctive particle &amp;quot;te&amp;quot;(-C). These groups are difficult to allocate to a single level, as dmy are used in expressing many factors such its parataxis, cause, means~ attendant circumstances, and because they vary semantically and syntactically. However, in reference to the pause length data, the adverbial verbs ill the former group might fall into LEV.1 or LEV.2, while those in the latter group with &amp;quot;te&amp;quot;(-\[) might full into LEV.4 or LEV.5. Conventionally, these two groups; are often treated as one &amp;quot;adverbial form&amp;quot;, aldlough many functional diiferences have been pointed out between them. Our data supports the difference with respect to the pause length. There are identical tendenci~ between two adw;rbial forms of adjectives: (&amp;quot;-k,F'(~//) and &amp;quot;-kute&amp;quot;(~ a 9=)) and two adverbial forms of pseudo adjectives: (&amp;quot;-ni&amp;quot;(~-=-) and &amp;quot;-de&amp;quot;( r)).</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="314" end_page="314" type="metho">
    <SectionTitle>
4 Concluding Remarks
</SectionTitle>
    <Paragraph position="0"> We have proposed a practical method for a global structure anMyzing algorithm of Japanese long sentences with lexical information (Lexical Discourse Grammar: LDG). This model assumes that Japanese conjunctive particles convey modality, and modality structure can basically be detected by lexical information. We assign a ~conjunction level' to each conjunctive particle and reduce the number of possible syntactic structures of Japanese long sentences. In addition, we assume that all conjunctive particles have a modification preference according to their modality. This preference assigns priorities to the possible structures of tile sentences.</Paragraph>
    <Paragraph position="1"> We applied LDG to a prosodic information control method in a Japanese text-to-speech conversion system to confirm the conjunction level experimentally. This method controls pause location and length in speech synthesis with the conjunction level in LDG, using only lexical information with no need fi~r syntactic analysis. Even so, it can tune tile pause length more finely than methods without sentence structure presumption.</Paragraph>
    <Paragraph position="2"> AnMyzing speech data, we confirmed a correlation between the level of a function word and the length of a pause inserted ai'~er that word. We are now in the process of developing a speech synthesis system with this method, by defining the default pause length for each conjunction level. In future research, LDG will also be applied to other prosodic information (rhythm and intonation).</Paragraph>
    <Paragraph position="3"> There can be little doubt that LDG will be more etfective when two or nmre conjunct equivalents of different levels appear in one sentence, since the LDG conjunction levels are closely related to the inter-clausal dependency. Unfortur~ately there were few such cases in the data used in this paper. In future work, we will collect such data to proove this hypothesis, in so doing will refine our method to improveits ability to analyze long Japanese sentences.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML