File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/c96-2212_intro.xml
Size: 2,020 bytes
Last Modified: 2025-10-06 14:06:05
<?xml version="1.0" standalone="yes"?> <Paper uid="C96-2212"> <Title>Hierarchical Clustering of Words</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> ()lie of bile fulida, rlrient~J issues concernhlg corpus-l)ased NI,P is t;he (tmLa 8I)a, rsetless prot)len'l. In view of the eft'e(',tiveliess of class-ha,seal ll-gl'a, lll \]<%ll-gllage nlodels i~gMnst Lhe (\]~ta s\]7)i~l'Seliess i)rol)lenl (Kneser iLli(l Ney 1993), it; is expected t;l-li~t classes of words are Mso usefiil for NI,P tasks ill such a wi~y that statistics oil (:\]~sses ;tre used whenever stal;istics oil individua, l words il, i'e una,vaihdlle or unreli&i)le. All ide, al type of clusi, ers for N I,P is the ()lie which gu;tra, rltees in ut iia\[ substitu I;M)ilit, y, ill tern'is ()f t)oth synl;a,ctic a, ud seltilultic SOUll(llleSs, &lnOllg words in the sa, rtle class.</Paragraph> <Paragraph position="1"> Furthermore, chlstering is nnl(:h more iiseful if the clusl;ers i~i'e of vnriMJe grmnllarity, or hierar-chi('al. We will consider i~ tree represent~tl, ion of MI the words in t,he vocM)uh~ry in which the root; node l:ei)resenl;s the whole vo(:i~l)uli~l'y i~lltl ~ le~f llOde rel)rese\[lt;S a, word ill the voclJ)llli~ry. Also, ~.uiy set of nodes in I;ile tree constil, utes ~ i)m:t,ition (or cluslx)ring) of the vo(:M)ulary if t;here exists one i%ll(I only Olle llode iu i, lle seL ,-%lollg the p{~th from the root node ix) ei~(:}l leiff node, In the following sectk)n<% we will describe i~ nletliod Or crea, th'lg bim~ry tree represeuti~l;ion ()f wor(|s a, ud present restllts of ev;tlua,tiilg a, nd conll)aring the qualii;y of i;he hierarchi(:M clusters ot)tMne(I fronl texl, s ()r W.q:y dilTerent sizes.</Paragraph> <Paragraph position="2"> *Calrrellt a.(hlress: Me(lbt lutegr~ttion I,al}or;ttory, Fujitsu L;tboral, ories I,td., Ka.w~tsMd, .\]~tpa.n. gtil~til: ushiod a(~gfl~d).fiiji tsu.(:o.j p.</Paragraph> </Section> class="xml-element"></Paper>