File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/a94-1043_metho.xml
Size: 4,953 bytes
Last Modified: 2025-10-06 14:13:38
<?xml version="1.0" standalone="yes"?> <Paper uid="A94-1043"> <Title>An Interactive Rewriting Tool for Machine Acceptable Sentences</Title> <Section position="3" start_page="0" end_page="207" type="metho"> <SectionTitle> 2. Interactive Rewriting Tool </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Design Principles </SectionTitle> <Paragraph position="0"> The problems of rewriting include difficulties in learning its know-how and unpredictable effects of rewriting. The following are our design principles: 1) Interaction-based system: Since rewriting is a dynamic process, the process should be interactive to deal with changes.</Paragraph> <Paragraph position="1"> 2) Presentation of rewriting candidates: The system should present possible recommended revisions to users so that they can select the best choice and guarantee the correctness of the rewriting.</Paragraph> <Paragraph position="2"> 3) Minimization of interactions: To minimize frequencies of interactions in rewri6ng, the system should fully utilize knowledge available to improve the accuracy of the diagnoses.</Paragraph> <Paragraph position="3"> 4) Optimization of interactions: Scales to measure the degree of rewriting(scalability) should be introduced to optimize interactions to obtain the maximum effects with minimum interactions.</Paragraph> </Section> <Section position="2" start_page="0" end_page="207" type="sub_section"> <SectionTitle> 2.2 System Configuration </SectionTitle> <Paragraph position="0"> The system consists of two main parts: a sentence checker and a user interface unit The sentence checker is composed of a sentence analyzer, an information extractor and a rewriting candidate generator. The information extractor extracts the information necessary for rewriting, such as morphological and syntactic information, from the analysis results of an NIT system, to detect problematic phrases and generate recommended rewriting examples along with guidance messages to help users with rewriting.</Paragraph> <Paragraph position="1"> The user interface displays the original sentences which require correction with their recommended telne~II~.TL, tz\[~. g~-~Z'7-~X~F (~CTRL+\] zyt.~3E rewriting examples and guidance messages as in Fig. 1. When there are several problems in one sentence, they are presented to users in order of importance. Knowledge for rewriting is accumulated as rules for the information extractor, which is actually a general-purpose information extracting tool-kit equipped with overall linguistic analyzing modules and information extracting/diagnosing modules.</Paragraph> </Section> </Section> <Section position="4" start_page="207" end_page="207" type="metho"> <SectionTitle> 3. Knowledge for Rewriting Long Sentences </SectionTitle> <Paragraph position="0"> Of those Japanese expressions that need rewriting before machine translation, long sentences which should be divided into shorter ones are the most important, as shown in previous studies(Kim, Ehara and Aizawa, 1992).</Paragraph> <Section position="1" start_page="207" end_page="207" type="sub_section"> <SectionTitle> 3.1 Criteria for Detecting Long Sentences </SectionTitle> <Paragraph position="0"> It is empirically known that simple factors, such as the number of characters or words in a sentence, are not sufficient to determine which sentences should be rewritten. Currently, we adopt both the number of words and the linguistic patterns to identify long sentences. This combined algorithm provides the precision ratio of 52% and the recall ratio of 96% for closed data. The two ratios improved by 9% and 6% respectively, compared with the case when the number of words in a sentence is the only determining factor.</Paragraph> </Section> <Section position="2" start_page="207" end_page="207" type="sub_section"> <SectionTitle> 3.2 Generation of Rewriting Candidates </SectionTitle> <Paragraph position="0"> The rewriting rules also generate candidates for rewriting expressions. There are four methods of sentence division: 1) Simple division (60%): A sentence is divided into two at the division point inflecting the ending of the in'st sentence and inserting an appropriate connective at the beginning of the second sentence, where necessary.</Paragraph> <Paragraph position="1"> 2) Supplementation of case Idlers (27%): A case, such as the subjective case, is supplemented after sentence division.</Paragraph> <Paragraph position="2"> 3) Supplementation of verbs (7%): Verbs are supplemented after sentence division.</Paragraph> </Section> </Section> <Section position="5" start_page="207" end_page="207" type="metho"> <SectionTitle> 4) Others (6%): </SectionTitle> <Paragraph position="0"> Of the above, method 1) has been implemented and method 2) is under development using semantic trees provided by the information extractor.</Paragraph> </Section> class="xml-element"></Paper>