File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/05/h05-1106_abstr.xml
Size: 1,107 bytes
Last Modified: 2025-10-06 13:44:14
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1106"> <Title>Language & Information Engineering</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We here propose a new method which sets apart domain-speci c terminology from common non-speci c noun phrases. It is based on the observation that terminological multi-word groups reveal a considerably lesser degree of distributional variation than non-speci c noun phrases.</Paragraph> <Paragraph position="1"> We de ne a measure for the observable amount of paradigmatic modi ability of terms and, subsequently, test it on bigram, trigram and quadgram noun phrases extracted from a 104-million-word biomedical text corpus. Using a community-wide curated biomedical terminology system as an evaluation gold standard, we show that our algorithm signi cantly outperforms a variety of standard term identi cation measures. We also provide empirical evidence that our methodolgy is essentially domain- and corpus-size-independent.</Paragraph> </Section> class="xml-element"></Paper>