File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/06/e06-2022_abstr.xml
Size: 987 bytes
Last Modified: 2025-10-06 13:44:50
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-2022"> <Title>Multilingual Term Extraction from Domain-specific Corpora Using Morphological Structure</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Morphologically complex terms composed from Greek or Latin elements are frequent in scientific and technical texts.</Paragraph> <Paragraph position="1"> Word forming units are thus relevant cues for the identification of terms in domain-specific texts. This article describes a method for the automatic extraction of terms relying on the detection of classical prefixes and word-initial combining forms. Word-forming units are identified using a regular expression. The system then extracts terms by selecting words which either begin or coalesce with these elements. Next, terms are grouped in families which are displayed as a weighted list in HTML format.</Paragraph> </Section> class="xml-element"></Paper>