File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/w94-0104_intro.xml
Size: 2,287 bytes
Last Modified: 2025-10-06 14:05:48
<?xml version="1.0" standalone="yes"?> <Paper uid="W94-0104"> <Title>Study and Implementation of Combined Techniques for Automatic Extraction of Terminology Bdatrice Daille TALANA</Title> <Section position="3" start_page="29" end_page="30" type="intro"> <SectionTitle> 3. Coordination </SectionTitle> <Paragraph position="0"> Coordination is a rather complex syntactic phenomenon (term coordination have been studied in \[Jacquemin, 1991\]) and seldom generates new terms.</Paragraph> <Paragraph position="1"> Let us examine a rare example of a term of length 3 obtained by coordination : N1 de Ns + N2 de N3 --+ N1 et N~ de N3 assemblage de paquet + dgsassemblage de paquets --r assemblage et dgsassemblage de paquets (packet assembly/desassembly) null It is difficult to determine whether a modified or overcomposed base-term is or is not a term. Take for example bande latgrale unique (single side-band): bande latgrale (side-band} is a base-term of structure N ADJ and unique (single) a very common post-modifier adjective in French. The fact that bande latgrale unique is a term is indicated by the presence of the abbreviation BLU (SSB). As abbreviations are not introduced for all terms, the right attitude is surely to extract first base-terms, i.e. bande latgrale (side-band}. Once you have base-terms, you can easily extract from the corpus terms of length greater than 2, at least post-modified base-terms and overcomposed base-terms by juxtaposition. But, even if we have decided to extract only base-terms (length 2), we have to take into account their variations, at least some of them. Variants of base-terms are classified under the following categories: 1. Graphical and orthographic variants By graphical variants, we mean either the use or not of capitalized letters (Service national or service national ((D/cl)omestic service), or the presence or not of an hyphen inside the Ni N2 structure (mode paquet or mode.paquet (packet(-)mode) ).</Paragraph> <Paragraph position="2"> Orthographic variants concern N1 PREP N2 structure. For this structure, the number of N2 is generally fixed, either singular or plural. However, we have encountered some exceptions: rdseau(x) ~ satellite, rgseaux(x) fi satellites (satellite network(s)).</Paragraph> </Section> class="xml-element"></Paper>