File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-2174_intro.xml

Size: 5,929 bytes

Last Modified: 2025-10-06 14:05:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2174">
  <Title>K. General Fiction L. Mystery M. Science Fiction N. Adventure and Western</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> A siml)le method for (:~d, egorizing texts into pre-deturmincd text gem:e c;ttcgorics using tit(: st;tti.',t, icM sl.+utd+u'd tcch nique of discriminatH, amdysis is demonstrated wil.h application to the Brown(:orpus. I)is(:rimina.ut analysis makes it possibh~ tl,qC it, la,rge lllllllber of l).~Xl'a-Ill(:l,(:rs Lh;tL llHl,y 1)(! SI)(1 cific for a. certain corpus or inlormation stream, and combine I.henl into ~t small tmmber ol + functions, wiLh t.he pa.ram(:i(:rs weighted oil basis of how usehd they ;u:e for discritniml.t ing text genres. An a.ppli(:~tl.ion to inforuta.tiott retrieval is discussed.</Paragraph>
    <Paragraph position="1"> Text Types Thor(; are. different types of l;exL '\['exl.s &amp;quot;al)oui,&amp;quot; l.he sa.me thing m~ty be in differing geurcs, of difl'(~rem. I.y I)eS, ;rod of v;trying quality. Texts vary along st'.ver;d param.</Paragraph>
    <Paragraph position="2"> el.ers, a.ll relc'wull, for l,he gcuera.l inlortlu~tiol~ rel, ri(wal problem of real.thing rea(lcr needs m.I texts. (liven this variation, in a text retrieval eonl.ext, the l)rol)lems arc (i) i(Mttifying ;cures, and (ii) choosing criteria t,o ch,s-ter texts of the smnc gem:e, wit, h l)redictal&gt;le l&gt;recision aml rcca.ll. This should uot he eonfused with t, he issue of idenl.ifying topics, m,d choosiug criW+ria that. diserinlinatc on(: topic from auother. All.hough u(&gt;t orthogonal to gem'(', del)endent; wu+iat, ion, the wu'iat, ioll i, hat, rela, l,es dirc(-t.ly to (:onW.uI; and topic is Moug or, her (litu&lt;'.usions.</Paragraph>
    <Paragraph position="3"> Na.l,ura.lly, there is (;o-va.riancc.. 'I'exl.s al)oul. (:(+rl.aitl topics ula,y only occur iu (:(;rt;ailt g(!tll'(!s, alt(\] {.exl.s ill eertaiu ge.nres may only t.rea.t c(q'l.ain topics; mosl. l.ol)ics do, however, occur iu several ;cures, which is what inl;erests us here.</Paragraph>
    <Paragraph position="4"> Douglas I~il)et: has sl, udied l;exl, variat.ion along scv eral l)aranmtcrs, and found that t,cxt.s can I)(,, cousidcrcd to wvry along live ditnensious. In his st, udy, he clush'.rs \[~ai.ures according t.o eowu'iauce, t.o find tmderlyiug di mensions (198!)). We wish to liud a method for idenl.ifvin; easily eomput.al)h; I)\[tl:al,|et.cH's t.hat ra.l&gt;idly classify previously IlllS(?(~ll texts in gell(':r~ql classes and along a small set smalh~r 1,tmn I~,il&gt;er's \[ivl'. of dimm,siot,s, s,,ch that l.hcy can bc cxplai,,(~d in i,,t,tit.iwdy siml)le terms to l.hc ,,so,&amp;quot; of a.n informal.ion rel.riewd ~Hq)liea-tion. ()m: aim is 1,o t;~ke ~ set of texts that. has b(:ei, select, ed I)y sotne sort of crude semantic analysis such as is typica.lly performexl I&gt;y an iufornmtion rel, ri(!vM sysl, em and I)art.il.ion il, flu'lher I)y genre or (.cxl. t;yl)e , aud  ............. ( l\]~'degw~Lc at e g' degzies) ..</Paragraph>
    <Paragraph position="5"> \[. Informa.tive 1. I)ress A. Press: report;tge B. Press: editoriaJ (L Press: reviews 4. Mis(: - l)~ Ileligion- &amp;quot; I,',. Skills and lIohhies 1.'. I)olml~u: Lore C. Belles \],cttr(s, cl.c.</Paragraph>
    <Paragraph position="6"> 21 Non-.tiction --llT(h)v. doc. &amp;quot;(~ m:lsc.</Paragraph>
    <Paragraph position="7"> ,I. \[,estr n('.d II. \[magin;ttivu 3. Fiction K. (',eneral l&amp;quot;ietion I,. Mystery - -N. Adv. ~ Wes{el'll P. tloma.nce }i: ii i,h.;i.</Paragraph>
    <Paragraph position="8">  t.o display this wu'iat.iou as siluply as possible in oue or l.wo dilu(msions.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Method
</SectionTitle>
      <Paragraph position="0"> 'vVe st,art by using \['catm'es similar go those firsl, hlw!s (.igat(d by \]~iber, but wc eonc('.ul,rate on (;hose t, hat; arc easy 1.o comput&lt;~ assuming we have a parl, of speech tag get ((hll, l.ing e/ /*l, 1992; (/hureh, 1988), such ;Is ,queh as i, Jlh'd l)(:l'SOll l)FoIIOllIl oeeul'l+Ci,C() l;atc ;18 o\])l)obed 1.o 'geucral hedges' (l~iher, 1989). More mid more of I/ihcr's |'egtlail'eS will be awfilahle with tim advent of more prolieieut aua.lysis programs, for iusl,a.nce if eomplel.e surface syntaet.ic l&gt;a.rsing were performed hefore catl!gorizat.iotl (Voul;ilaiueu ,~ Talmnailu'u, 1993).</Paragraph>
      <Paragraph position="1"> W(~ then use (liscriuduant analysis, a. technique from descriptive .~tatist.ics. I)iscrimimull. atmlysis tak,'s a set of l)rCcat.egorized imlividuals and (I;~ta ou t,hcir vm.m liOl, Oil iI lltllIlb(21&amp;quot; o1' plLr~lliiCl.el'S~ lLlld WOl'ks olll. a s(!t discriminant J'uuctions which dist;ingnishes hetw(.etl t.he groups. These l'uuetious can l.llen l)e used I.o predicl, the ca+l.egory mlmd)ershil)s of new iudividuals based on tJmir )ara.met(!r scores (Tal.sluoka, 1971 ; M ustouen, 1965).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Evaluation
</SectionTitle>
      <Paragraph position="0"> &amp;quot;or data. we used the Browu corpus of English text sn,i,  as seen in table 1. We ran discriminant analysis on the texts in the corl)us using seve.ral different features as seen in table 2. We used the SPSS system for statistical data analysis, which has as one of its fcatm.es a complete discriminant analysis (SPSS, 1990). The diseriminant flmction extracted t?om the data by the analysis is a linear combination of tlle parameters. To categorize a set into N categories N - 1 functions need to be determined, llowever, if we are content with being able to plot all categories on a two-dimensional plane, whidl probably is what we want to do, for ease of exposition, we only use the two first and most significant functions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML