File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/00/c00-1022_relat.xml

Size: 3,056 bytes

Last Modified: 2025-10-06 14:15:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1022">
  <Title>Exogeneous and Endogeneous Approaches to Semantic Categorization of Unknown Technical Terms</Title>
  <Section position="4" start_page="145" end_page="146" type="relat">
    <SectionTitle>
3 Related Work
</SectionTitle>
    <Paragraph position="0"> ~brm Semantic Ca, teg'orizat, ion is on several aspects similar to _Th(;sam'us l~2xtension (Uramoto, 1996; Tokunaga et al., 1997). Our methods are ch)se to those used for positioning unknown words in thesauri. However, the two issues cm~ be ditlhrentiated with respect to the manit)ulated data. A thesaurus is intended to cover a large set; of concet)tual domains while a terminological database is tbcused accurately on a specific topic and its related domains. For example, in (Tokunaga et al.~ 1997), the thesaurus to be extended contains lnore than 500 categories. This tends to make the t)roblem harder, lint, since many categories are strictly independent, it is easier to find distinctive tieatures between categories. By contrast, our terminological database contains only 70 categories. But, in this restricted set, we find categories corresponding to close or even overlapping knowledge areas. It; is more difficult to ditt'erentiate them.</Paragraph>
    <Paragraph position="1">  tqn:(;hermore, (;he endogeneous al)t)roaeh, which ext)ioits the mull;i-word natm'e of terminological units, caroler 1)e al)l)lied (;()(;hesaurus extension l)e(:ause of the large alll()llll(; of single-word (;he-Sallrlls e\]ll;l:ies I.</Paragraph>
    <Paragraph position="2"> It is usethl to compare exogeneous I;ernt categorization with corl)us-based WS1) methods.</Paragraph>
    <Paragraph position="3"> \]in 1)oth cases, contextual ilfli)rmation extracted from corpora are used in order to assign the ltlOS|; plausible se, illa, ll(;ic |;ags I;O words. In WSI), the contextual cues that co-occur with l;he target word (:ons(;itute the main (;raining source w\]ler(:~s, in term (:ategoriza(;ion, the (:onI;extual information occurring with (;he term to l)e (:al;egorized should not l)e iuchuled ilt training (bd;a sin('e the term is sut)l)osed to t)e unknowlL The only relevant training sources are the conl;(~xtual cues surrounding the already categorized terms, q?his is a basil: ditl'erclme that ext)lains why WS\]) tasks usually a(;hicve t)ettor t)erfor-Ulallce l;han term cal;egoriz~ttion an(t l;hesnurus extension.</Paragraph>
    <Paragraph position="4"> In a terminology acquisition framework, Itaberl, et al. (1998) I)rOl)OSe an exogeneous c~ttegorizalion nml;hod of unknown simple words. They use local context of simt)le words 1)rovided by ~ terlll eX(;lJa, e(;iOll sysl;em.</Paragraph>
    <Paragraph position="5"> l~n(logene, ous (;erm c~t;egorization (:~m also l)e COlllpared wiI;}l sollle ;~l)l)ro;tches |;o terlll (;hlstering (Bourigaul(; and .lac(tuemin , 1999; Assad|, 19!)7). These al)l)roa('hes (;ak(~ adv;mi;ag('. of l;he lexical and syntac(;i(: structures of te('lmical terms in order to t)uild seman(;iC/: clusters.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML