File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-2118_intro.xml
Size: 5,002 bytes
Last Modified: 2025-10-06 14:00:46
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-2118"> <Title>Automatic Lexical Acquisition Based on Statistical Distributions*</Title> <Section position="3" start_page="0" end_page="815" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> \])eta.ileal hfforma.tion a.I)out verbs is critical to a.</Paragraph> <Paragraph position="1"> broad ra.nge of NI,I ) and lit 1;asks, yet; ils mau us.1 (lel;('rmina.tion for la.rge numl)ers o\[' verl)s is difficult aml resource intensive. I{.esea.rch o,I tim a.ul,()matic, a(-quisil;ion o\[' verb-I)ased k,owl(~(Ig(, has succeded in gleaning sylH.a.('l;ic l)rol)erties o\[' verl)s such as sul)ca.tegoriza.tion frames from el> line resources (I}rent, \]9!)3; lh'iscoe a.nd C,a.rroll, 1997; \])err, 1997; Ma.nning, \]993), ll,ecently, researchers have investigaPSed statistica.l corpusba.sed methods for lexica.l sema.ntic classitica.tion from synta.ctic prol)erties of verl) usage (Aone a.nd McKee, \]996; l,a.pa.ta, and Brew, \]999; Schulte im Wa.lde, :1998; Stevenson a.nd Merle. 1999; Steve.nson et a l., \] 999; McCarthy, 2000).</Paragraph> <Paragraph position="2"> C, orl)us-based al)pro~mhes to lexica.l sema.ntic classitic~tion in pa,rticular ha.ve dra.wn on Levin's hypothesis (I,evin, 1993) that verbs can be classilied according to the dia.thesis aJterna.tions (a.lternations in the syntactic expressions o\[&quot; a.rguments) ill which they f)articil)a.te l'or exa.mple, whether a.</Paragraph> <Paragraph position="3"> * This research was partly sponsored 1)y US NSI&quot; grants #9702331 and #9818322, Swiss NSI&quot; Mlowshlp 8210d65(;9, Information Sciences (.Ollll(il of Hurters University and IllCS, U. of Peimsylwmia. 'l'his research was conducted wldle the tirst author was at llutgers University. verb occurs in the dative/prepositiona.l phrase alterna.tion in l';nglish. One diagnostic for dis.thesis a.lternations is the sulx'a.tegorization aJternatives of a. verb. ltowew~,r, some classes exhibit the same subca.tegoriza.tkm possibilities but differ in their a.rgument structures, i.e. tim content el' the thereal;it roles assigned to the arguments of l;he verb.</Paragraph> <Paragraph position="4"> rFhis gyl)e of situation consl;itutes a. pa.rticula.rly difficult case R)r corpus-based classification methods. null In this paper, we apply corpus-based lexica.l a.cquisition methodology 1;o distinguish classes of verbs which allow the same subca.tegorizaPSions, but differ in tlteJna.tic roles. We first a ssu me tha.t one ca.n a.ut;oma.tiea.lly restrict l;he choice o\[' ('lasses to those that; paxl;icil)a.1;e in the relewu~t subcategorizations (c\['. (l,a.p~ta. and Brew, \]999)). Our prOl)OSa.1 is lhen to use st.a.tistics ovel: di~Ll;hesis a lterna.nl,s as a, wa.y to \['urther distinguish those verl)s wldch allow 1;he same sul)ca.tegoriza.tions; achievi,g fine-grained cla.ssifica.tion within that S('l,. ()UI' work \['O(tllSeS oll determining tile 1)esl sema.nl, ic class for a verl) lgpc - the set of usages o1&quot; a. verl) across a. document or corpus rather t;ha.n fl)r a single verb I, okc n in {~ single local context.</Paragraph> <Paragraph position="5"> In this way, we c~u/ exploit the broad beha.vior o1' the verb a.cross 1;he corl)uS to determine its most likely class overall.</Paragraph> <Paragraph position="6"> We investiga.te the proposed a.l)l)rOaCh in an indel)th case study o\[' the three major classes of o1> tiona.lly inlra,nsitive w, rl)s in English: ullergative, unaccusa,tive, and ol)ject-drop. More specifically, according to l,evin's classificaJ;ion (l,evin, 1993), the unerga.tives are ma.nner of motion verbs, such as jump and march; the una.ccusa.tives are verl)s of cha.nge of state,, such a.s open and explode; the object-drop verbs a.re unexpressed object a.lCerna.-Lion verl)s, such as played a.nd painted. These classes a.ll supl)ort 1)oth tr~msitive and intra.nsi1,ire sul)cal,egoriza.tions, I)ut a.re distinguished by the pal;tern of thema.tic role assignments i,o subjecC a.nd object position. We a.utomatica.lly cla.ssi(y these verbs on the basis of sta.tistical a,p null proxilnations to syntactic indicators of the underlying argunlent structures, using numerical features collected from a large syntactically annotated (tagged or parsed) corpus. We apply machine learning techniques to determine whether the fi'equency distribntions of the features, individually or in combination, support automatic classification of the verbs. To preview our results, we demonstrate that combining only five numerical indicators is sufficient to reduce the error rate in this classification task by more than 50% over chance. Specifically, we achieve ahnost 7(1% accuracy in a task whose baseline (chance) per\[brmance is 34%, and whose expert-based upper bound is calculated at 86.5%. We conclude that a distribution-based method for lexical semantic verb classification is a promising avenue of research.</Paragraph> </Section> class="xml-element"></Paper>