File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-1018_metho.xml
Size: 7,613 bytes
Last Modified: 2025-10-06 14:12:26
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-1018"> <Title>Semiautomatic Interactive Multilingual Style Analysis (SIMSA)</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Style markers </SectionTitle> <Paragraph position="0"> On the very beginning of style analysis, we need an inventory of style markers.</Paragraph> <Paragraph position="1"> Style errors can be detected on several different levels: word, phrase, sentence and text.</Paragraph> <Paragraph position="2"> Relewant stylistic features are on word level: word length; fillers; nominalisation; compound nouns, terminology; on phrase level: noun-phrase complexity; cumulation of adjectives; complex prepositional phrases; on sentence level: sentence length; compound sentences; distance between verb stem and .prefix; on text level: passive voice; pronouns; phenomena of cohesion/ coherence: reference, conjunctions, etc.</Paragraph> <Paragraph position="3"> Within the project, two teams (Siemens AG, Germany and Triumph-Adler AG, Germany) are working on the development of :relevant style markers. The development is conducted in four steps. First, principles of good style and possible stylistic markers in general had to be identified by examining literature on good technical writing and linguistic literature on style markers.</Paragraph> <Paragraph position="4"> For each style marker the information needed has been identified, so that it can be used by the style checker. Some style markers can be transferred into an algorithm just by using statistical methods, others need lexical information, and a third group needs syntactic information which has to be provided by the parser within the TWB project.</Paragraph> <Paragraph position="5"> In a third step, the style markers are formalized and checking algorithms are being developed.</Paragraph> <Paragraph position="6"> Finally, functions are being developed to transfer given values (average, standard deviation, etc) in a bar chart representation including thresholds and the degree of deviation from the predefined norm.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Architecture </SectionTitle> <Paragraph position="0"> SIMSA consists of three main parts. The user has the option to set the norms and thresholds of the stylistic features by putting in a representative or paradigmatic text corpus (standardization of style marker values). He can perform an analysis of a given text (Analysis; batch mode) and he can start a dialogue for more information on a given analysis (Analysis dialogue; interactive mode).</Paragraph> <Paragraph position="1"> Standardization of style marker values Importance of style markers, their average values, and thresholds of their values depend on the analyzed language, and they differ with the kind of analyzed texts. How can stylistic critiques be adapted to different fields of application'? In principle, there are three possibilities: First, stilistic norms can be fixed once and for all without any possibility of change.</Paragraph> <Paragraph position="2"> This case allows only one conception of &quot;one good style&quot;. But what about functional concepts in which deviations concerning style markers are understood as deviations in the functionality of a text? And what about different functionalities of style markers in different languages? A second approach is to set the standard norm by the users themselves or at least by a superuser. This is the approach in EPISTLE where &quot;thresholding, together with adjustable weights, allow tailoring of style critiques to individual environments...&quot; (Heidorn et al\] 982:323).</Paragraph> <Paragraph position="3"> A somewhat different approach was taken in SIMSA: SIMSA provides the user with default norms for several kinds of text.</Paragraph> <Paragraph position="4"> Moreover, it offers the option to set norms according to a given text corpus. The user puts in some texts which belong to a given 80 2 language and a given kind of text. SIMSA will analyze the text corpus and will set and store the norm of the style markers accordingly.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Analysis </SectionTitle> <Paragraph position="0"> The analysis part of SIMSA is under developinent by the above mentioned teams of Siemens AG and TA Triumph-Adler. I)ue to the nature of TWB as an integrated toolkit, analysis functions will USe other TWB tools as far as possible. The analysis functions can be divided into three main groups, in purely statistical functions, in functions with lexical access, and in functions using parser output.</Paragraph> <Paragraph position="1"> Statistical algorithms are sulficient for style markers as e.g. sentence length and word length. The analysis functions check the size of the text corpus (a certain size is necessary to get significant devialions), compute average, standard deviation and other necessary vahies and compare these values with the norm values.</Paragraph> <Paragraph position="2"> Functions using lexical information arc necessary for style markers as e.g. l\]llers and slang expressions. The access to lexica can be managed in two ways. Either words can be matched against small lexica specially designed for stylistic purt)oses containing only a small {lnlotlnl and seirlantically restricted class of words (e.g. fillers or slang expressions), or words can be matched via the parser output against the lexicon used by the parser, lit the second case, necessary stylistic information (e.g. &quot;word is a chemical technical term&quot;) is contained in the lexicon entry.</Paragraph> <Paragraph position="3"> Functions using parser outpul are necessary for style markers as e.g. noun phrase complexity, distance between verb stem and verb prefix, sentence complexity.</Paragraph> <Paragraph position="4"> These functions filter the parser output for necessa:ry information.</Paragraph> <Paragraph position="5"> The style checker still works if parser access is not possible. In this case (and in cases the user doesn't want an analysis concerning all style markers) the analysis of certain style markers can be suppressed.</Paragraph> <Paragraph position="6"> The results of the analysis (values of deviation from the norm, etc.) are stored in a separate analysis file.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Analysis dialogue </SectionTitle> <Paragraph position="0"> There are two ways to start the anal~,-sis dialogue. First, an option &quot;analysis dialogue&quot; will be offered to the user after the style checker has finished its analysis.</Paragraph> <Paragraph position="1"> Second, the user can call the analysis dialogue separately if there is an analysis file and a corresponding text file.</Paragraph> <Paragraph position="2"> &quot;Analysis dialogue&quot; opens a window containing bar charts which demonstrate for each analyzed style marker the degree of deviation from the norm. The user can ask for more information about certain style features in general and he can ask for the occurrences of the criticized style markers in the text. Due to the nature of stylistic errors as grammatically correct but more or less inadequate usage of linguistic features, the &quot;Analysis dialogue&quot; is thought to give recommendation as lar as possible, but not to correct text passages automatically.</Paragraph> </Section> </Section> class="xml-element"></Paper>