File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-2160_metho.xml

Size: 13,919 bytes

Last Modified: 2025-10-06 14:07:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2160">
  <Title>Producing More tleadable Extracts by Revising Them</Title>
  <Section position="5" start_page="1071" end_page="1072" type="metho">
    <SectionTitle>
3 Less Readability of Extracts
</SectionTitle>
    <Paragraph position="0"> To investigate the revision of extracts experimentally, we had 12 graduate students produce extracts of 25 newspaper articles from the NIHON KEIZAI SHINBUN, the average length of which was 30 sentences. We then asked them to revise the extracts (six subjects per extract).</Paragraph>
    <Paragraph position="1"> We obtained extracts containing 343 revisions, made for any of the three purposes listed in the last section. We selected the revisions for readability, and classified them into 5 categories, by taking into account the categories of cohesion by Halliday and Hasan\[Halliday et al., 1976\]. Table 1 shows the sum of the investigation.</Paragraph>
    <Paragraph position="2"> Next, we illustrate each category of revisions. In the examples, darkened sentences are those that are not included in extracts, but are shown for explanation. The serial number in the original text is also shown at the beginning of sentences a A) Lack of conjunctive expressions/presence of extraneous conjunctive expressions The relation between sentences 15 and 16 is adversative, because there is a conjunctive 'L. b'\[l (However)' at the beginning of sentence 16. But because sentence 15 is not in the extract, 'L h'L (However)' is considered unnecessary and should be deleted. Conversely, lack of conjunctive expressions might cause the relation between sentences to be difficult to understand. In such a case, a suitable conjunctive expression should be added. For these tasks, discourse structure analyzer is required.</Paragraph>
    <Paragraph position="3"> We use the following three tags to show revisions.</Paragraph>
    <Paragraph position="4">  &lt; ,,t,~ &gt; F,~ &lt; I,aa &gt;: add a new expression ~.</Paragraph>
    <Paragraph position="5"> &lt; ,~t &gt; B z &lt; In,4 &gt;: delete an expression ~..</Paragraph>
    <Paragraph position="6"> &lt; ,,I, ~t &gt; t% &lt; /r,r &gt;: replace an expression ~3 with B41 (The company plans to give women more opportunity to work by employing fidl-time workers.) 15. ~KmJUI~TIY, h ~ ~ ,C/J&lt;, y.~,J: -) ~&amp;quot; !cV~.:~C/lnm (Since there have been no similar cases before, the project that women join is now in a hard situation, though the company puts hopes on it.) 16. &lt;del&gt; L b'L &lt;/del&gt; \[~{~/.~&amp;quot; _'_O0-~':12&amp;quot;~e(.~ ~4.s'~t~ )\ f { J~ iiriI~\] Z ~,i#l-: t{lJ 6\[i:~.~ ~ if(Ill ~ \]EYb ~~., 5. (&lt;del&gt;However,&lt;/del&gt; it is making efforts of reformation which will be profitable both for the company and the female workers.) B) Syntactic complexity 2. (fl:flEf~ij;before revision)  (It is the first project in telecommunication business, which President Kashio wants to be one of the central businesses in the future, and it is also the preparation for expanding the business to cel-</Paragraph>
    <Paragraph position="8"> (It is the first project in telecommunication businesses, which President Kashio wants to be one of the central business in the future.)  (It is also the preparation for expanding the business to cellular phone.) Longer sentences tend to have a syntactically complex structure \[Klare, 1963\], and a long compound sentence should generally be divided into two simpler sentences. It has also been claimed, however, that short coordinate sentences should be combined \[Mathis et al., 1973\].</Paragraph>
    <Paragraph position="9"> C) Redundant repetition 00_ b,:~;.~,~l: \]\ ~(rb.</Paragraph>
    <Paragraph position="10"> (The new product 'ECHIGO BEST 100' which ECHIGO SEIKA released this April is popular among housewives.) (&lt;rep The company&gt; ECHIGO SEIKA &lt;/rep&gt; has been making use of NTT Captain system since 1987.) If subjects of adjacent sentences in an extract are the same, as in the above example, readers might think they are redundant. In such a ease, repeated expressions should be omitted or replaced by pronouns. In this example, the anaphoric expression '\[iiJ ~1: (the eoinpany)' is used instead of the original expression.</Paragraph>
    <Paragraph position="11">  (We are now in a vicious circle where the layoffs by companies discourage consumptions, which in turn results in lower sales.) 9. &lt;del&gt; ~&amp;quot;q~'(&amp;quot;. &lt;/del&gt; &amp;quot;; 9 4 2. ~--t)' )&lt;~\[,hJL-C (&lt;:del&gt;In such a situation,&lt;/del&gt; CHRYSLER has done well, because its management strategy exactly fits the age of low growth.) In this example, the referent of '~l~-(&amp;quot; (in such a situation)' in sentence 9 is sentence 8, which is not in the extract. In such a case, there are two ways to revise: to replace the anaphoric expression with its antecedent, or to delete the expression. The re.vision in the example is the latter one. For the task, a method for anaphora and ellipsis resolution is required.</Paragraph>
    <Paragraph position="12">  sell softwares using CD-ROM, and he think it is a big project for his company.) In this second example, since 'CEO Son' appears without the name of the company in the extract, without any background knowledge, we may not u:nderstand what company Mr. Son is the CEO of. Therefore, the name of the company 'Softbank' should be added as the supplementary information. The task requires a method for information extraction or at least named entity extraction. E) lack of adverbial particles/presence of extraneous adverbial particles '2,6. ~,&amp;quot;\[&amp;quot;1~ F - :-. * t/q,~dl-(-J~atil I l~tl,I,jlq~,'l, tl?D~.?-~z~.,.vb (It is a good opportunity to promote the mutual understanding between Japan and Vietnam that Mr. Do MUOI, a chief secretary of Vietnam, visits</Paragraph>
    <Paragraph position="14"> (Japanese government should consider long-term economical support&lt;del&gt;, too &lt;/del&gt;.) In the above example, there is an adverbial particle &amp;quot;5 (, too)' and we can find that sentences 29 and 30 are paratactical. But, because sentence 29 is not in the extract, the particle '-L (, too)' is unnecessary and should be deleted.</Paragraph>
  </Section>
  <Section position="6" start_page="1072" end_page="1073" type="metho">
    <SectionTitle>
4 Revision System
</SectionTitle>
    <Paragraph position="0"> Our system uses the Japanese public-domain analyzers JUMAN \[Kurohashi et al., 1998\] and KNP \[t(urohashi, 1998\] morphologically and syntactically analyze an original newspaper article and its extract. It then applies revisions rules to the extract repeatedly, with reference to the original text, until no rules can revise the extract further.</Paragraph>
    <Section position="1" start_page="1072" end_page="1073" type="sub_section">
      <SectionTitle>
4.1 Revision Rules
</SectionTitle>
      <Paragraph position="0"> Because tile techniques needed for dealing with all the categories of revisions dealt with in the previous  section were not available, we devised and implemented revision rules only for factors (A), (C), and (D) in Table 1 by using JPerl.</Paragraph>
      <Paragraph position="1"> a) Deletion of conjunctive expressions We prepared a list of 52 conjunctive expressions, and made it a rule to delete each of them whenever the extract does not include the sentence that expression is related. To identify the sentence related to the sentence by the conjunction \[Mann et al., 1986\], the system performs partial discourse structure analysis taking into account all sentences within three sentences of the one containing the conjunctive expression.</Paragraph>
      <Paragraph position="2"> The implementation of our partial discourse structure analyzer was based on Fukumoto's discourse structure analyzer \[Fukumoto, 1990\]. It infers the relationship between two sentences by referring to the conjunctive expressions, topical words, and demonstrative words.</Paragraph>
      <Paragraph position="3"> c) Omission of redundant expressions If subjects (or topical expressions marked with topical postposition 'wa') of adjacent sentences in an extract were the same, the repeated expressions were considered redundant and were deleted.</Paragraph>
      <Paragraph position="4"> d-l) Deletion of anaphors To treat anaphora and ellipsis successfully, we would need a mechanism for anaphora and ellipsis resolution (finding the antecedents and omitted expressions). Because we have no such mechanism, we implement a rule with ad hoc heuristics: If an anaphor appears at the beginning of a sentence in an extract, its antecedent must be in the preceding sentence. Therefore, if that sentence was not in the extract, the anaphor was deleted.</Paragraph>
      <Paragraph position="5"> d-2) Supplement of omitted subjects If a subject in a sentence in an extract is omitted, the revision rule supplements the subject from the nearest preceding sentence whose subject is not omitted in the original text. This rule is implemented by using heuristics similar to the above revision rule.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="1073" end_page="1073" type="metho">
    <SectionTitle>
5 Evaluation of Revision Sys-
</SectionTitle>
    <Paragraph position="0"> tem We evaluated our revision system by comparing its revisions with those by human subjects (evaluation 1), and comparing readability judgments between the revised and original extracts (evaluation 2).</Paragraph>
    <Section position="1" start_page="1073" end_page="1073" type="sub_section">
      <SectionTitle>
5.1 Evaluation 1: comparing system
</SectionTitle>
      <Paragraph position="0"> revisions and human revisions Because revision is a subjective task, it was not easy to prepare an answer set of revisions to which our system's revisions could be compared. The revisions that more subjects make, however, can be considered more reliable and more likely to be necessary. When comparing the revisions made by our system with those made by human subjects, we therefore took into account the degree of agreement among subjects.</Paragraph>
      <Paragraph position="1"> For this evaluation, we used 31 newspaper articles (NIHON KEIZAI SHINBUN) and their extracts. They were different from the articles used for making rules. Fifteen of extracts are taken fronl Nomoto's work \[Nomoto et al., 1997\], and the rest were made by our group. The average numbers of sentences in the original articles and the extracts were 25.2 and 5.1.</Paragraph>
      <Paragraph position="2"> Each extract was revised by five subjects who had been instructed to revise the extracts to make them more readable and had been shown the 5 examples in section 3. As a result, we obtained 167 revisions in total. The results are listed in Table 2.  We compared our system's revisions with the answer set comprising revisions that more than two subjects made. And we used recall (R) and preci~ sion (P) as measures of the system's performances. ( Numberofsystem'srevisions ) matched to the answer R= Number of revisions in the answer</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="1073" end_page="1074" type="metho">
    <SectionTitle>
( Number ofsystem'srevisions )
</SectionTitle>
    <Paragraph position="0"> matched to the answer P= Number of systemfs revisions Evaluation results are listed in Table 3. As in Table 3, the coverage of our revision rules is rather small (about 1/4) in the whole set of revisions in Table 2. It is true that the experiment is rather small and can be considered as less reliable. Though it is less reliable, some of the implemented rules can cover most of the necessary revisions by human subjects. However, precision should be improved.</Paragraph>
    <Paragraph position="1">  and revised extracts In the second evaluation, using the same 31 texts as in evaluation 1, we asked five human subjects to rank the following four kinds extracts in the order of readability: the original extract (without revision)(NON-REV), human-revised ones (REV-1 and REV-2), and the one revised by our system (REV-AUTO). REV-1 and REV-2 were respectively extracts revised in the cases where more than one and more than two subjects agreed to revise.</Paragraph>
    <Paragraph position="2"> We considered a judgment by tile majority (more than two subjects) to be reliable. The results are listed in Table 4. The column 'split' in Table 4 indicates the number of cases where no majority could agree. The results show that both REV-1 and REV-2 extracts were more readable than NON-REV extracts and that REV-2 extracts might be better than REV-1 extracts, since the number of 'worse' evaluations was smaller for REV-2 extracts.</Paragraph>
    <Paragraph position="3">  In comparing REV-AUTO with NON-REV, we use 27 texts where the readability does not degrade in REV-2, since the readability cannot improve with revisions by our system in those texts where the readability degrades even with human revisions. Even with those texts, however, in ahnost half the cases, the readability of the revised extract was worse than that of the original extract. The main reason is that the revision system supplemented incorrect subjects.</Paragraph>
  </Section>
  <Section position="9" start_page="1074" end_page="1074" type="metho">
    <SectionTitle>
6 Discussion
</SectionTitle>
    <Paragraph position="0"> Although the results of the evaluation are encouraging, they also show that our system needs to be improved. We have to impleinent inore revision rules to enlarge the coverage of our system. One of the most frequent revisions is to add conjunctions(37%).</Paragraph>
    <Paragraph position="1"> We also need to reform our revision rules into more thorough implementation. To improve our system, we think it is necessary to develop a robust discourse structure analyzer, a robust mechanism for anaphora and ellipsis resolution, and a robust system of extracting named entities. They are under developlllent now.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML