File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2185_metho.xml
Size: 10,320 bytes
Last Modified: 2025-10-06 14:13:44
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-2185"> <Title>REFERENCE RESOLUTION USING SEMANTIC PATTERNS IN JAPANESE NEWSPAPER, ARTICLES</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> REFERENCE RESOLUTION USING SEMANTIC PATTERNS IN JAPANESE NEWSPAPER, ARTICLES Ta,kahiro Wakao </SectionTitle> <Paragraph position="0"/> <Paragraph position="2"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 INTRODUCTION </SectionTitle> <Paragraph position="0"> Reference resohttion is one of the important tasks in naturM l~mguage l)rocessing. I n Japanese newspaper articles, pronouns are not often used ~m referential expressions for COl'fll)ally ll0.11\]es~ \])lit shortelled (:()Illparty names and doush.a (&quot;the same eompany&quot;) are used more often (Murald et al. 1993). Although there have beeo. studies of reference resolution Ib,' wmous aou. phrases in Japanese (Shibata el al. 1990; Kitani |994), except Kitani's work, they do not clearly show how to lind the referents in computa.tionally i)lausible ways for a large amount of data, suc, h as a newst)aper database. In this l)aper 1, we determilm the referents of dousha and their locations I)y hand, and then propose one simph&quot; and two heuristic methods which use SClllantic information in text ,';uc.h as collll)ally ilalllC8 and their patterns, so as to I,est these three methods on how accurately they lind t.he correct referents.</Paragraph> <Paragraph position="1"> Dousha is f(mnd with several l)artich~s such as &quot;\]~e&quot;, &quot;ga&quot;, &quot;*to&quot;, and &quot;go&quot; in neWSl)al)er artMes. Those which co-occur with ha and ga arc choseu for the data since they are l.hc. two most fre(luent particles when dousha is in the sul)jeet position in a sentenc(:, q'ypically, ha marks the topic of the sentence and ga marks the subject of tim sellt(~l/(;e. A typical use of dousha is as follows: Nihon Kentakii I'~Hraido (;hikin ha, Japan Kentucky Fried (;hicken ha, sekai s;d(lai no piza chien, world's largesl, pizza chain store, Piza Ilatto to teikei wo musul)i, Pizza llut to tie-up estal)lish, kotoshi gogatsu kara zenkoku de starting May t, his year, nation-wide, takuhai piza chien no tenkai wo pizza, deliw?l'y chain store extension l,|,his paper was written wheu the author was at the (;mnl)uting l{escm'ch t~&bot';d.ol'y of New Mcxi(:(, .~tld.e (Jnlwwslty. The aul.ho,&quot; \]l~ts been al; Unlvcrsity of Shelllcld slncc J;mu;n'y \[ 994.</Paragraph> <Paragraph position="2"> hajintesu to hapl)you shita.</Paragraph> <Paragraph position="3"> begin almounced.</Paragraph> <Paragraph position="4"> sarani dousha ha furaido chikin no I'V |oreover, the 8allle COlll\])ally chicken of fried takuhai saabisu nimo nori(lasu.</Paragraph> <Paragraph position="5"> delivery service as well will start.</Paragraph> <Paragraph position="6"> A rough translation is: &quot;Kentucky l&quot;ried ()hieken Japan allllOlltlced that it had established a tie-Ul) with the world largest l)izza chain store, l)izza tlut, and I)egan to expand pizza delivery chain stores nation-wide starting in May this year. Moreover, the company will start delivery of fried chicken as well.&quot; Pottsha ill t\]le second sel,te\]lce relel:S to Kenl.ucky Fried ('&icken Japan as &quot;the company&quot; does in l,hc English translation. As shown in this example, some articles COlltailt lllore than one possible referent or ronlt)any ~ aIId the reference resolution of doush.a should identify the referent correcl, ly.</Paragraph> </Section> <Section position="3" start_page="0" end_page="1134" type="metho"> <SectionTitle> 2 LOCATIONS ANI) CONTEXTS OF THE tH~,Ii'ERENTS </SectionTitle> <Paragraph position="0"> Most of the Japanese newspal)er articles examined in this study are in t.he domain of Joint-Ventures.</Paragraph> <Paragraph position="1"> The som'ees of lh<'. newspaper articles are mostly lhe Nikkci and the Ashahi. '\['h(! total number <)l&quot; the articles is 1375, and there are 42 cases of dousha with ga amt 66 cases (>f dousha with ka in the entire set of articles.</Paragraph> <Paragraph position="2"> The followiug tables, Table 1 and Tabh'. 2, show the locations and contexts where the referents of both subsets of dousha appear.</Paragraph> <Paragraph position="3"> In two paragraphs before \[lbpic of the paragraph I company name + ha In three paragraphs before 2 Topic of the paragraph \[ company name + ha __ 2 Note for Table 1 and Table 2 company name referred to is a part of a larger subject noun phrase. company name referred to comes at the end of tim sentence, a way of emphasising the company name in Japanese. company name with to (with), kara (from), wo tsuuji (through), tono aidade (between or among). 1,'or doush(~ with ga (Table 1), the referred coinpany nan'les, mr the referents appear in non-sul/ject positions fi:om time to time, especially if the referent appears in the same sentence as dousha does. For dousha with ha (Table 2), compared with Table 1, very l>w referents are located in the same sentence, and most of the referents are in the subject position. For both occurrences of dousha, a considerable number of the referellts appear two or more sentelice8 beR)re, and a few of them show up even two or three paragraphs before.</Paragraph> </Section> <Section position="4" start_page="1134" end_page="1135" type="metho"> <SectionTitle> 3 THREE HEURISTIC METHODS TESTED </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="1134" end_page="1134" type="sub_section"> <SectionTitle> 3.1 Three Heuristic Methods </SectionTitle> <Paragraph position="0"> Oile simple and two heuristic iilethods to fill(I the rcl L erents of dousha are described below. The lirst, the simple method, is to take the closest COml/any name, (the one which appears most recently before dousha), as its referent (SilnI)le Closest Method or SCM).</Paragraph> <Paragraph position="1"> It is used ill this paper to indicate the t)~Lseline pcrlbrmance tbr reference resolution of dousha.</Paragraph> <Paragraph position="2"> The second method is a modified Siml)le Closest Method for dousha with ca. It is basically the same as SCM except that: * if there is (tile or there, conlpany liaine hi the sairie seiltellce before the dousha, take |.lie closest COllipally nanw. as the referent.</Paragraph> <Paragraph position="3"> * if there is a conipany llaille inllnediately followed by ha, ca, deha, or niyorulo somewhere bel5re dousDa, use the closest such company name as the referent.</Paragraph> <Paragraph position="4"> * if the previous sentence ends with a cOral)any ll;_~llle, thus l)utting aii enlphasis on the COlllpally liaine, make it the rcl\]2relit.</Paragraph> <Paragraph position="5"> * if there is a pattern &quot;COlilpaily liame lie hlllliali lialIle title...&quot; (equivalent to <'title hiilllaii lialno of cOUll)any elaine...&quot; in I'\]nglish) in the prove-Oils SOlltellce, then iiso the COllipaliy n~iliie as tim reforelit. Typical titles are sh.achou (president) alld kaichou (Chairinan uP I/oard).</Paragraph> <Paragraph position="6"> The theM heuristic method is used t~r dousha with ha cases. It is also based on SCM except the following points: * if there is a company name innnediately tbL lowed by ha, ga, deha, or uiyoruto somewhere.</Paragraph> <Paragraph position="7"> before dousha, use the closest such colnl)any name as the referent.</Paragraph> <Paragraph position="8"> * if the previous sentence ends with a company name, thus putting an eniphasis Oil the coinliany nalne, make it the reli'~rent.</Paragraph> <Paragraph position="9"> * if there is a pattern &quot;coral)any nanie no human name title...&quot; (equivalent to &quot;title human name of company name...&quot; in English) in the provemils seiltelice~ theil /ise the cowilialiy Ilanie as the refi;rent.</Paragraph> <Paragraph position="10"> The third method is in fact a set of the second method, and both of them use semantic information (i.e. company name, human name., title), syntactic patterns (i.e. where a conll)any name, a human name, or a title appears in a sentence) and several specific lexical items which come immediately after tim company Ilallies.</Paragraph> </Section> <Section position="2" start_page="1134" end_page="1135" type="sub_section"> <SectionTitle> 3.2 Test Results </SectionTitle> <Paragraph position="0"> The three lnethods haw'. heen tested on the developmerit data from which the lnethods were produced and on the set. of unseen test data.</Paragraph> <Paragraph position="1"> 3.2.1 Against the dew;lolmient data As mentioned in section two, there are 42 cases of dousha with ga and 66 cases of dousha with ha.</Paragraph> <Paragraph position="2"> For the dousha with ga rases, the Simple Closest Method identifie.s the referents 67% correctly (27 correct out of 42), and the second inethod does so 90% (38 out of 42) correctly. ,qCM misses a number of referents whMi appear iii previous sentences, and most of those which appear two or inore sentelices previously.</Paragraph> <Paragraph position="3"> For the cases of dousha with ha, SCM identities the referents correctly only 52% (34 correct out of 66), however, the third heuristic method correctly ideiltilies 94% (62 out of 66).</Paragraph> <Paragraph position="4"> The test data was taken front Japanese newspaper articles on micro-electronics. There are 1078 arti-c.les, and 51 cases of dousha with ga and 250 cases of dousha with ha. The test has been conducted against the. all get cases (51 of them) and the first t O0 Bet cases. For the dousha with ga cases, the Simple Closest Method identifies the referents 80% correctly (4 I correct out of 51), and the second method does so 96% (49 out of 51) correctly.</Paragraph> <Paragraph position="5"> For the c~Lses of dousha with ha, SCM identifies the referents correctly only 83% (83 correct out of 100), however, tl,e third heuristic method correctly ide.ntifies 96% (96 out of 100).</Paragraph> <Paragraph position="6"> The following table, Table 3, shows the summary of the test. results.</Paragraph> </Section> </Section> class="xml-element"></Paper>