File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1118_evalu.xml
Size: 4,901 bytes
Last Modified: 2025-10-06 13:59:09
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1118"> <Title>Controlling Gender Equality with Shallow NLP Techniques</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Evaluation of Gendercheck </SectionTitle> <Paragraph position="0"> We evaluated the Gendercheck editor based on two texts: ET1 A collection of unconnected negative examples taken from the (Ges, 1999) and (Sch, 1996).</Paragraph> <Paragraph position="1"> TT2 The deputy law of the German Bundestag Gender imbalances were manually annotated with a SGML code, where each di erent code refers to a di erent rewrite proposal to be plotted in the editor as in the lower part in gure 1. Table 4.1.3 shows the distribution of error classes in the two texts. Each error class had several subtypes which are omitted here for sake of simplicity.</Paragraph> <Paragraph position="2"> In ET1 every sentence has at least one error; on average one word out of six is marked as \ungendered&quot;. Since ET1 is a set of negative examples, errors are uniformly distributed. Distribution of errors in text TT2 is di erent from ET1. TT2 does not contain a single occurrence of a class 3 error. On average, only one word out of 60 is manually marked and |due to the long size of sentences |there are 0.46 errors per sentence on average.</Paragraph> <Paragraph position="3"> Text ET1 was used to develop and adjust the kurd rule system for marking, ltering and error code assignment. We iteratively compared the automatically annotated text with the manually annotated text and computed precision and recall. Based on the misses and the noise, we adapted the style module as well as the error annotation schema. Thus, in a rst annotation schema we assigned more than 30 di erent error codes literally taken from (Ges, 1999) and (Uni, 2000). However, it turned out that this was too ne a granularity to be automatically reproduced and values for precision and recall were very low. We than assigned only one error class and achieved very good values for precision of over 95% and recall over more than 89%. Based on these results we carefully rened a number of subtypes of the three error classes.</Paragraph> <Paragraph position="4"> Final results are shown in table 5. Results for the test text TT2 are slightly inferior to those of the development text ET1. We brie y discuss typical instances of misses and noise.</Paragraph> <Paragraph position="5"> a) Noise in class 1 (generic use of masculine) are mainly due to \-ling&quot; - derivations such as \Abk ommling&quot; (descendant) which are masculine in German and for which no female equivalent forms exist. These words could be included in the exclude lexicon (see section 4).</Paragraph> <Paragraph position="6"> b) In some cases nominalized participles such as \Angestellte&quot; (employee) and \Hinterbliebene&quot; (surviving dependant), which are usually very well suited for gendered formulations due to their ambiguity in gender, were erroneously disambiguated. These instances produced noise because lters did not apply.</Paragraph> <Paragraph position="7"> c) Misses in class 1 can be traced back to some words which have not been detected as human agents such as \Schriftf uhrer&quot; (recording clerk) and \Ehegatte&quot; (spouse). These words could be entered into the include lexicon. Both lexicon should be made user-adaptable and user extendible in future versions of the system.</Paragraph> <Paragraph position="8"> d) Many of the misses in class 2 are due to a reference in the preceeding sentence. Since the system is currently sentence based, there is no easy solution in enhancing this type of errors.</Paragraph> <Paragraph position="9"> The possessive pronoun \seiner&quot; in the second sentence of example (9) refers to \Bewerber&quot; (applicant) in the rst sentence. This connection cannot be reproduced if the system works on a sentence basis.</Paragraph> <Paragraph position="10"> (9) Einem Bewerber um einen Sitz im Bundestag ist zur Vorbereitung seiner Wahl innerhalb der letzten zwei Monate vor dem Wahltag auf Antrag Urlaub von bis zu zwei Monaten zu gew ahren. Ein Anspruch auf Fortzahlung seiner Bez uge besteht f ur die Dauer der Beurlaubung nicht.</Paragraph> <Paragraph position="11"> e) An example for noise in class 2 is shown in example (10). The relative pronoun \der&quot; (who,which) was detected by Gendercheck but erroneously been linked to \Beamte&quot; instead of \Antrag&quot; (application) which are both masculin in German.</Paragraph> <Paragraph position="12"> (10) Der Beamte ist auf seinen Antrag, der binnen drei Monaten seit der Beendigung der Mitgliedschaft zu stellen ist, . ..</Paragraph> <Paragraph position="13"> Much more powerful mechanisms are required to achieve a breakthrough for this kind of errors.</Paragraph> </Section> class="xml-element"></Paper>