File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-2022_intro.xml

Size: 2,408 bytes

Last Modified: 2025-10-06 14:03:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-2022">
  <Title>Multilingual Term Extraction from Domain-specific Corpora Using Morphological Structure</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Many methods for the automatic extraction of termsmakeuseofpatternsdescribingthestructure of terms. This approach is especially helpful for multi-word terms. Depending on the method, patterns rely on morpho-syntactic properties (Daille, 1996; Ibekwe-SanJuan, 1998), the co-occurrence of terms and connectors (Enguehard, 1992; Baroni and Bernardini, 2004) or the alternation of informative and non-informative words (Vergne, 2005). These patterns use words as basic units and thus apply to multi-word terms. Methods for the acquisition of single-word terms generally depend on frequency-related information. For instance, the frequency of occurrence of a word in a domain-specific corpus can be compared with its frequency of occurrence in a reference corpus (Rayson and Garside, 2000; Baroni and Bernardini, 2004). Technical words usually have a high relative frequency difference between the domain-specific corpus and the reference corpus.</Paragraph>
    <Paragraph position="1"> In this paper, we present a pattern-based technique to extract single-word terms. In technical and scientific domains like medicine many terms are derivatives or neoclassical compounds (Cottez, 1984). There are several types of classical word-forming units: prefixes (extra-, anti-), initial combining forms (hydro-, pharmaco-), suffixes (-ism) and final combining forms (-graphy, -logy). Interestingly, these units are rather constant in many European languages (Namer, 2005).</Paragraph>
    <Paragraph position="2"> Consequently, insteadofrelyingonasubworddictionary to analyse compounds like (Schulz et al., 2002), our method makes use of these regularities to automatically extract prefixes and initial combining forms from corpora. The system then identifies terms by selecting words which either begin or coalesce with these units. Moreover, forming elements are used to group terms in morphological and hence semantic families. The different stages of the process are detailed in section 2. Section 3 describes the results of experiments performed on four corpora, in English and in French.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML