File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-5006_metho.xml
Size: 19,632 bytes
Last Modified: 2025-10-06 14:09:41
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-5006"> <Title>Transforming a Sentence End into News Headline Style</Title> <Section position="4" start_page="41" end_page="41" type="metho"> <SectionTitle> 3 News Headlines and Their Sentence Ends </SectionTitle> <Paragraph position="0"> There is an email service that delivers Japanese news headlines three times a day on weekdays. That is Nikkei news mail(1) provided by NIKKEI-goo. We have been collecting them since December 1999. Table1 shows the statistics we have obtained.</Paragraph> <Paragraph position="1"> number of mails 3365 number of stories 21127 number of sentences 40374 News headlines are more distinctive than news stories in sentence end. Therefore, we investigated part of speech on both news headlines and newspaper(Nihon Keizai Shimbun(2)). Table2 shows the comparison.</Paragraph> <Paragraph position="2"> In the newspaper, declinable words are responsible for the majority of sentence ends. In news headlines, there are in fact many verbal nouns in sentence ends.</Paragraph> <Paragraph position="3"> Japanese words are classified broadly into two types; one derived from China and another originated in Japan. News headlines contain the former more than the latter because words from China carry more information in fewer characters. We investigated news headlines and news on a paper which contained words of both Chinese and Japanese origins. The result is shown in Table3. In fact, news headlines preferably use the words of Chinese origin about three times as much as that of Japanese origin.</Paragraph> <Paragraph position="4"> We can imagine that a short phraseology is preferably used when the phraseology has the same information. We estimate that the news headlines are high density phraseology than newspaper.</Paragraph> </Section> <Section position="5" start_page="41" end_page="44" type="metho"> <SectionTitle> 4 Method of Summarization </SectionTitle> <Paragraph position="0"> In order to transform a sentence end into a shorter one, we have conducted three kinds of procedures: (1) Deletion of target words at sentence end (2) Deletion with minor transformation after the target words (3) Transformation of sentence end More precisely, we have proposed conducting the following 10 procedures for transforming Japanese sentence ends into a news headline style: 1. Cut off dictum and honorific phraseology (1) 2. Cut offb(wo shimesu:show)(1) 3. Change verbal noun(2) 4. Cut offs(naru)(2) 5. Cut off the part which followsTt(akirakani) (2) 6. Change words of Japanese origin(2) 7. Cut offo`O(teshimau)(1) 8. Cut offqm(tatu)(2) 9. Transform phraseology indicated the action in the fu-</Paragraph> <Paragraph position="2"> 10. Change to compound noun (3) We summarized in this order, and process 3. 9. can be switched.</Paragraph> <Section position="1" start_page="42" end_page="42" type="sub_section"> <SectionTitle> 4.1 Cut Off Dictum and Honorific Phraseol- ogy </SectionTitle> <Paragraph position="0"> Phraseologies shown below are dictum or honorific phraseology. These phraseology in sentence end is cut off because these are not necessary to understand the meaning.</Paragraph> <Paragraph position="2"/> </Section> <Section position="2" start_page="42" end_page="42" type="sub_section"> <SectionTitle> 4.2 Cut Offb(wo simesu:show) </SectionTitle> <Paragraph position="0"> When a sentence end isb(wo shimesu)or`h(wo shimeshita), this phraseology is cut off becauseb(shimesu) has little meaning in that sentence. The main verb of the sentence is the verbal noun before b(wo shimesu).</Paragraph> </Section> <Section position="3" start_page="42" end_page="43" type="sub_section"> <SectionTitle> 4.3 Change Verbal Nouns </SectionTitle> <Paragraph position="0"> The expression after the verbal noun closest to the main verb of the sentence is deleted. In Japanese, we put a wordb(suru)after a verbal noun to make a verb, but in the summary it can be deleted since we can still understand the usage.</Paragraph> <Paragraph position="1"> When a self-sufficient word exists following a verbal noun, we do not dispose this.</Paragraph> <Paragraph position="2"> Step 1 The part followingb(suru)is cut.</Paragraph> <Paragraph position="3"> Nominalized verbal noun to cutb(suru)is the verbal noun in this arrangement.</Paragraph> <Paragraph position="4"> Step 2 When the cut part contains an estimation phraseology(mirareru)ori O(daou), tack onT(ka))and finish.</Paragraph> <Paragraph position="6"> (He seemed to surrender in trouble with escape fund.) Step 3 When the cut part contains a contradiction phraseologysM(nai)oru(nu), tack ondc(sezu)at the sentence end and finish. When this part concurrently contains a passive phraseology(reru), tack on^ c(sarezu)and finish.</Paragraph> <Paragraph position="7"> Step 4 When a sentence end isnoun(wo) verbal noun,(wo)is cut to become a compound nounnounverbal noun.</Paragraph> <Paragraph position="9"> (Starting this month, Japanese chess problems are seen in ads of each station and in trains.) Step 5 When a sentence end isparticle1 nounb\q(surukoto)particle2 noun,b\q(surukoto)is cut. If the particle1 is(wo)orT(ka), this particle changes tow(no).</Paragraph> <Paragraph position="10"> If the cut part containsso(hajimete:first), procedures from Step 2 is different as follows.</Paragraph> <Paragraph position="11"> Step 2 When the cut part containsbwx (surunoha)or`hwx(shitanoha),s o(hajimete:first)is tacked on before verbal noun. When the part of cut contains (mirareru), tack onT(ka)in sentence end and finish.</Paragraph> <Paragraph position="12"> Step 3 When the cut part contains`o (shite),s(go hatsu)is tacked on in the sentence end. When the term just before noun is particleT(ka), this particleT(ka) changes into particlew(no).</Paragraph> <Paragraph position="13"> Step 5 When the cut part contains (mirareru),T(ka)is tacked on in the sentence end.</Paragraph> </Section> <Section position="4" start_page="43" end_page="43" type="sub_section"> <SectionTitle> 4.4 Cut Offs(naru) Whenparticles(naru)exists in a </SectionTitle> <Paragraph position="0"> sentence, this part and the following are cut off.</Paragraph> <Paragraph position="1"> When a self-sufficient word exists in the cut part, the meaning changes or we do not understand the meaning.</Paragraph> <Paragraph position="2"> Therefore, when a self-sufficient word exists particles(naru)following, the sentence is not disposed this arrangement.</Paragraph> <Paragraph position="3"> Theparticles(naru)and the following are cut off. When the particle ist(ni)or q(to),t(ni)is tacked on in the sentence ists in a sentence, the part which followsT t(akirakani)is cut off. When the cut part contains a self-sufficient word, the meaning changes or we do not understand the meaning.</Paragraph> <Paragraph position="4"> Then, when a self-sufficient word exists in the sentence, the sentence is not disposed this arrangement. null Step 1 The part which followsTt(akirakani)is cut off.</Paragraph> <Paragraph position="5"> Step 2 Research the part of cut and dispose the cut part.</Paragraph> <Paragraph position="6"> ~Contradiction phraseologysM(nai)oru(nu) and passive phraseology(reru)exist.</Paragraph> <Paragraph position="7"> ^c(sarezu)is tacked on in the sentence end. ~The contradiction phraseologysM(nai)oru (nu)exists.</Paragraph> <Paragraph position="8"> dc(sezu)is tacked on in the sentence end. (surukoto wo)is cut off. When the part before the cut isparticlet(ni)verbal noun, t(ni)is changed to(e). When the part before the cut part isparticle(wo)verbal noun,(wo)is changed tow(no).</Paragraph> </Section> <Section position="5" start_page="43" end_page="44" type="sub_section"> <SectionTitle> 4.6 Change Words of Japanese Origin </SectionTitle> <Paragraph position="0"> When a Japanese origin word by Table3 exists in a sentence, the part before it is cut off. Then the Japanese origin word is replaced by Chinese one.</Paragraph> <Paragraph position="1"> When a self-sufficient word exists following Japanese origin word, the sentence is not disposed of this arrangement. We changed the word which shows Table3.</Paragraph> <Paragraph position="2"> Step 1 Japanese origin word and following are cut off.</Paragraph> <Paragraph position="3"> Step 2 When sentence end isb\q (surukoto wo), cut offb\q(surukoto:doing), tack on the correspondent Chinese origin word, and finish the arrangement.</Paragraph> <Paragraph position="5"> (They have decided to start making an 'instruction book on extensive assistance for disaster'.) Step 3 When a sentence condition is followed, the sentence is disposed.</Paragraph> <Paragraph position="6"> ~A sentence end is a particleU(ga)and Japanese mau), we feel that the sentence is negative and o`O(teshimau)is not necessary to understand the meaning of the sentence. Thus we cut offo`O(teshimau)in the headline.</Paragraph> <Paragraph position="7"> This arrangement is used not only the sentence ends but middle of the sentence. When the term after the cut part isy(ba), we do not dispose it. When the sentence end iso`O (teshimau), change the term beforeo`O (teshimau)to primitive form and finish.</Paragraph> <Paragraph position="8"> Wheno`O(teshimau)exists without the sentence end,o`O(teshimau)and the character before this phraseology is cut off.</Paragraph> </Section> <Section position="6" start_page="44" end_page="44" type="sub_section"> <SectionTitle> 4.8 Cut offqm(tatsu) </SectionTitle> <Paragraph position="0"> When a sentence containsqm(tatsu),q m(tatsu), the part following it is cut off. When the following part contains the self-sufficient word, the meaning changes or we do not understand the meaning.</Paragraph> <Paragraph position="1"> Therefore, when a self-sufficient word exists in the following part, the sentence is not disposed this arrangement. Whenqm(tatsu)is a part of idiom, the sentence is not disposed of this arrangement. null Step 1qm(tatsu)and the following part are cut off.</Paragraph> </Section> <Section position="7" start_page="44" end_page="44" type="sub_section"> <SectionTitle> 4.9 Phraseology of Words Implying Future </SectionTitle> <Paragraph position="0"> When a phraseology which indicate the action in the future such as-h(keikaku:attempt) or'(yotei:plan)exists in the sentence, the phraseology can changed to(he)in Japanese. Therefore, the terms listed below are the phraseology of indicated the action in the future. Whenb(suru)this phraseology exists in the sentence, this part and following are changed to(he).</Paragraph> <Paragraph position="2"> in a sentence and the following contains a contradiction phraseologysM(nai)oru(nu), the sentence is not disposed of this arrangement.</Paragraph> <Paragraph position="3"> When the following contains theqMO(toiu) orz, the sentence is not disposed of this arrangement. null b(suru)this phraseologyand following are cut off. when the sentence end is particle, the particle is cut off.(he)is tacked on the sentence end.</Paragraph> <Paragraph position="4"> 4.10 Change to a Compound Noun When a sentence end isnounparticle verbal nounafter the above arrangements, the particle cut off to become a compound noun. When the noun is neither pronoun, person name, unique noun nor postfix for Chasen(3), this arrangement is not disposed. When the particle is T(kara),p(de)or(mo), this arrangement is not disposed.</Paragraph> <Paragraph position="5"> We make a compound noun dictionary for The Mainichi Newspapers(4) to check the adequacy of compound nouns. Whennoun particlet(ni) verbal nounand the dictionary containsnoun verbal nounwhich is cut oft,noun particlet(ni)verbal nounis changed to nounverbal noun. When the particle is not t(ni),nounparticleverbal nounis changed tonounverbal noun.</Paragraph> <Paragraph position="7"> (A man's body was found on the third floor of burned-out site.)</Paragraph> </Section> </Section> <Section position="6" start_page="44" end_page="45" type="metho"> <SectionTitle> 5 Experiments </SectionTitle> <Paragraph position="0"> We implemented the proposed technique with Perl programming language to measure the ade- null quacy of proposed technique. We summary with this program. Then input sentence are all sentences seen in the newspaper corpus. The number of input sentences is 232,038, and 73,512 outputs are somehow summarized in our method.</Paragraph> <Section position="1" start_page="45" end_page="45" type="sub_section"> <SectionTitle> 5.1 Summarization Ratio </SectionTitle> <Paragraph position="0"> We calculated a sentence ratio and number of reduced characters in a sentence. This result of experiment is shown in Table4. The method of Table4 shows the section number. This Table4 shows the result which used the only one method.</Paragraph> <Paragraph position="1"> The summarization ratio is 94%. In fact, this method is reduced the 6% about one sentence.</Paragraph> </Section> <Section position="2" start_page="45" end_page="45" type="sub_section"> <SectionTitle> 5.2 Subjective Evaluation </SectionTitle> <Paragraph position="0"> We also evaluated the proposed technique by human judgment. We picked up 1,000 sentences at random from summary sentences, and three examinees individually accounted them. The sentences are measured by majority decision. Assessment criterion is: (1) same meaning without context, and (2) low unnaturalness. The result is shown in Table5. The numbers in the table denote the section numbers explaining the process of transformation.</Paragraph> <Paragraph position="1"> We have also computed the influence of personal difference. In this kind of subjective evaluation different person may answer difference judgment. We have evaluated our results in three criteria: (1) at least one said correct, (2) at least two said correct, and (3) all three said correct. This result is shown in Table6. The Table illustrates that correctness is more than 90% in all cases.</Paragraph> </Section> <Section position="3" start_page="45" end_page="45" type="sub_section"> <SectionTitle> 5.3 Comparison to the Human Summaries </SectionTitle> <Paragraph position="0"> We compare summaries of the proposed method and by the human. We picked up 100 sentences in summary sentences at random. One examinee summarized the original sentences which corresponded the pick up the summary sentences.</Paragraph> <Paragraph position="1"> We computed the summarization ratio about these sentences. The result is shown in Table7.</Paragraph> <Paragraph position="2"> Although the sentence ratio of machine summary is close to the manual summary's one, number of reduced characters are approximately one character different. This indicates that human try to change many parts of sentence according to the change of the sentence end, while the machine does not consider such influence. Change of sentence end often requires transforming the whole syntax structure, such as change of aspect or form. We need more investigations on this issue. null</Paragraph> </Section> </Section> <Section position="7" start_page="45" end_page="47" type="metho"> <SectionTitle> 6 Discussions </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="45" end_page="45" type="sub_section"> <SectionTitle> 6.1 Discussion of Erroneous Summaries </SectionTitle> <Paragraph position="0"> In this section we describe some erroneous summaries by our method and discuss the reasons. null (The face show the character like an annual ring.) Exp.13 is error example in arrangement 'cut off theb(wo shimesu:show)' When the term beforeb(wo shimesu)is the noun, the sentence does not have main verb. The main verb which does not exist in the sentence is not right in Japanese. when the term before b(wo shimesu)is noun, this arrangement does not disposed. This kind of error is covered. But when the noun isQ(kangae:concept),(ikou:disposition)or_ null `(mitooshi:forecast), this arrangement is correct.</Paragraph> <Paragraph position="2"> (It is decided to caution the overuse to user.) Exp.14 is the error example 'change the word of Japanese origin. When the cut offb\ q(surukoto:doing), the modification relation is changed. Therefore, the modification relation is a wrong one. When the particle(wo)is changed to particlew(no), this kind of error is covered(Exp.15).</Paragraph> <Paragraph position="3"> cut off, it is not congruent inflected forms ofo `O(teshimau)and the verb. Wheno` O(teshimau)is cut off, the inflected forms must be congruent.</Paragraph> </Section> <Section position="2" start_page="45" end_page="45" type="sub_section"> <SectionTitle> 6.2 Verbalness/Nominalness of Verbal Noun </SectionTitle> <Paragraph position="0"> The sentence end isxs(ha hatsu:first) in Section4.3. There are a big differences by humans in degree of accepting this expression. We thus change expressionxs(ha hatsu)into so(hajimete:first). The example before changed is shown in Exp.17.</Paragraph> <Paragraph position="1"> (It is the first time that President Putin has a talk to the captain of Arab Crown) Some people feel unnatural or wrong in this example. But when the original sentences do not haveso(hajimete:first), the summary sentences are correct. The example is shown This example gives us no unnaturalness. We think the verbal noun affect this. The verbal noun represents that indicates the kind. The verbal operation of verbal noun is varied by humans. We think concretely aboutq (kaidan:meeting)of Exp.17and Exp.18.</Paragraph> <Paragraph position="2"> First, we think thatq(kaidan:meeting) is complemented the verbal nounqb (kaidan suru:have a talk). The predicate is generally at sentence end in Japanese. When the predicate does not exist in a sentence, it is inclinable in human thought that sentence end term is predicate. The other hand, we think thatq(kaidan:colloquy)is nominal or verbal operation in Exp.18 becauseq (kaidan:meeing)is not sentence end. then whenq(kaidan:meeting)is nominal, human have unnaturalness. And whenq (kaidan:meeting)is verbal, human do not feel unnatural.</Paragraph> <Paragraph position="3"> We cite the error summary which sentence end is noun other than verbal noun in this paper but the verbal operation of noun is pertained in these sentences. And the noun of operation verbal is Q(kangae:concept)other than verbal noun.</Paragraph> </Section> <Section position="3" start_page="45" end_page="47" type="sub_section"> <SectionTitle> 6.3 Comparison of Machine and Manual Summaries </SectionTitle> <Paragraph position="0"> We examine the machine and manual summaries. Although many sentences are not much different, some sentences have big differences for summarization. One example is shown as follows, original sentence, its machine summary and its manual summary respectively.</Paragraph> <Paragraph position="1"> This is shown thatK(aru)is dictum phraseology. And the sentence end is(mo). This is often seen in the news headline. But the proposed technique do not deal with them.</Paragraph> </Section> <Section position="4" start_page="47" end_page="47" type="sub_section"> <SectionTitle> 6.4 Summarization Failure </SectionTitle> <Paragraph position="0"> We examine the sentences which are not summarized by the method. We picked up the 200 sentences at random and examine whether or not it should be summarized. This results is that 9 sentences are missing. The example is shown below with the supposed summary.</Paragraph> <Paragraph position="1"> (Mr. Ikemoto's blob was found from the burned-out site.) Exp.20 is not summarized. The reason of this error is caused by an error of the morphological analysis.</Paragraph> </Section> </Section> class="xml-element"></Paper>