File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1014_intro.xml
Size: 3,261 bytes
Last Modified: 2025-10-06 14:01:18
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1014"> <Title>Semiautomatic labelling of semantic features</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Semantic information is essential in a lot of NLP applications. In our case, the feature [+-animate] is necessary to disambiguate between the possible Basque translations for the English preposition &quot;of&quot; and the Spanish preposition &quot;de&quot;, when referring to location or possession. This ambiguity appears very often when translating to Basque [Diaz de Ilarraza et al., 2000]. A complete manual labelling of semantic information would prove extremely expensive.</Paragraph> <Paragraph position="1"> This study aims to outline the strategy and design of a semiautomatic method for labelling semantic features of common nouns in Basque, expanding and improving the idea outlined in [Diaz de Ilarraza et al. 2000]. Due to the poor results obtained, this study dismissed the possibility of an initial approach aimed at extracting the information corresponding to the (+-animate) feature automatically from corpus.</Paragraph> <Paragraph position="2"> Instead, an alternative idea was proposed, i.e.</Paragraph> <Paragraph position="3"> that of using semantic relationships between words extracted from the Basque monolingual dictionary Euskal Hiztegia (Sarasola 1996). In this context, we used genus data and specific relators, together with a few words manually labelled, to extract the information corresponding to the (+-animate) feature. The results obtained were very promising: 8,439 common nouns were labelled automatically after the manual labelling of just 100.</Paragraph> <Paragraph position="4"> This paper describes the work carried out with the aim of expanding this idea this idea through the inclusion of information about synonymy, repeating the automatic process iteratively in order to obtain better results and, monitoring the reliability of the labelling of each individual noun. After studying the ideal relationship between the manual part of the operation and the scope of the automatic process, we generalised the process in order to adapt it to other semantic features. We obtained very satisfactory results considering the labelling of common nouns contained in the dictionary: for the [+-animate] feature, we labelled 12,308 nouns with an accuracy of 99.2%, after the manual labelling of only 100.</Paragraph> <Paragraph position="5"> This paper is organised as follows: section 2 presents the semantic relationships between words extracted from the Basque monolingual dictionary, and used by our semiautomatic labelling method. The method itself is described in section 3. The experiments carried out with the aim of optimising the efficiency of the method are described in section 4, and section 5 outlines the accuracy and scope of the labelling process for the [+-animate] semantic feature.</Paragraph> <Paragraph position="6"> Finally, section 6 describes how the method was generalised to cover other semantic features. The study finishes by underlining the results obtained and suggesting future research.</Paragraph> </Section> class="xml-element"></Paper>