File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2180_metho.xml

Size: 10,317 bytes

Last Modified: 2025-10-06 14:13:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2180">
  <Title>CUSTOMIZING AND EVALUATING A MULTILINGUAL DISCOURSE MODULE</Title>
  <Section position="3" start_page="0" end_page="1109" type="metho">
    <SectionTitle>
2 DISCOURSE MODULE
ARCHITECTURE
</SectionTitle>
    <Paragraph position="0"> In Aone ~ultl McKee \[2\], we have described our new language- and domain-independent discourse module within our text underslanding system. In addition to being lmlguage- mid domain-independent, the module is ewduable m~d mfinable to different applications and domains. The discourse mchitecture is motiwtted by our need to port our text uuderstmlding system to diflcrent languages (e.g.</Paragraph>
    <Paragraph position="1"> English, Japanese, Spanish) mid to different dom~dns (of.</Paragraph>
    <Paragraph position="2"> Aone et al. \[1\]). The discourse mtxlule is strictly dam~hiven so that mmphora resolution h)r different lmlguages mid domains can be achieved sunply by selecling necessary dala. It consists of one discourse processor (the Resolution Engine) and three discourse knowledge bases (the Discourse Phenomenon KB, the Discourse Knowledge Source KB, the Discourse Domain KB). The Discourse AdminisUator ix a developmenl-time tool for defitfiug the three discourse KB's. The m'chitecture is shown in Figure</Paragraph>
    <Section position="1" start_page="0" end_page="1109" type="sub_section">
      <SectionTitle>
2.1 I)iscourse Knowledge Bases
</SectionTitle>
      <Paragraph position="0"> The Discourse Knowledge Source KB houses small well-delined mmphora resolution strategies. Each knowledge source (KS) is an object in the hierarchically organized KB, and infl)rmation can be inherited from more general to more specific KS's. This KB consists of three kinds of KS's: generators, \[liters and orderers. A generator is uscd to generate possible anlecedent hypotheses fi'om a certain region of text. Afilter is used to eliminate impossible hypotheses, while an ot~lerer is used to rmlk possible hyl)othescs in a preference order if there is more than one.</Paragraph>
      <Paragraph position="1"> Most of the KS's are language-independen! (e.g. all the generalors and the semanlic tilters). Even when they are language-specilic, a sub-KS can inherit information from its superclass KS's while defining specific data lee:ally. For ex,'unple, the Semantic-Gender-Filter KS 1 deliues only funclional definition of this KS, while its sub-KS's for English ~md Japanese each specify \]~mguage-specific data ~md inherit the stone funclioual definilion from their pro'on!  KS.</Paragraph>
      <Paragraph position="2"> 1. Seluanlic-Gender-Filter filters out an antecedent  hypothesis whose semantic gende1 is not consistent with the restriction imposed by the syntaclic gender of ~l pI'OIIOHI|,  The Discourse Phenomenon KB contains hierarchically organized discourse phenomenon objects (e.g. Nmne-Anaphora, DeIinite-NP) each of which specifies a definition of the discourse phenomenon and a set of KS's (i.e. generators, tilters, and orderers) to apply to resolve this particular discourse phenomenon. Because the discourse KS's are independent of discourse phenomena, the stone discourse KS cm~ be shared by different discourse phenomena in different languages ,and domains. For exampie, KS's such as Sem,-mtic-Type-Filter and Recency-Orderer are used by most discourse phenomena in multiple languages.</Paragraph>
      <Paragraph position="3"> Finally, the Discourse Domain KB contains discourse domain objects each of which defines a set of discourse phenomena to hmldle in a particular domain. Since texts in different domains exhibit different sets of discourse phenomena, and since dilt'erent applications even within the same domain may not have to handle the same set of discourse phenomena, the discourse domain KB is a way to customize ,and constrain the workload of tile discourse module.</Paragraph>
      <Paragraph position="4"> These three hierarchically organized discourse KB's make it possible to share some of the discourse KB's while also being able to add language- mid domain-specitic discourse data.</Paragraph>
    </Section>
    <Section position="2" start_page="1109" end_page="1109" type="sub_section">
      <SectionTitle>
2.2 Resolution Engine
</SectionTitle>
      <Paragraph position="0"> The Resolution Engine is the run-time processing module which finds tile best ,antecedent hypothesis tot a given ~maphor by using the discourse KB's described above. First, it determines from the Discourse Dom~fin KB which discourse phenomena to handle giveu a particular language eald domain. Then, it uses the Discourse Phenomenon KB to classify ml auaphor as one of the discourse phenomena and to decide which KS's to apply to it. Next, the Engine applies appropriate generator KS's to get ,'m initial set of antecedent hypotheses, mid then applies filter KS's to remove inconsistent hypotheses. When there is more than one hypothesis left, orderer KS's specified in the Discourse Phenomenon KB are invoked to rank the hypotheses.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="1109" end_page="1110" type="metho">
    <SectionTitle>
3 CUSTOMIZING DISCOURSE KB'S
</SectionTitle>
    <Paragraph position="0"> We have customized our discourse KB's to perform a data extraction t,'~sk in the joint venture domain. Our text understanding system takes English mid Japanese newspaper articles about joint ventures as input (cf. Figure 2), and outputs database templates (eL Figure 3). The system has to extract from the ,articles infonnation regm'ding which organizations participate iu a joint venture (including a new joint venture compmly if any), what the purpose of tile joint venture is (e.g. selling coal), who tim people m'e that are associated with these organizations, etc. We made a task-oriented decision that handling organization mmphora, both definite NPs (e.g. &amp;quot;the company&amp;quot;) and name anaphora (e.g. &amp;quot;Toyota&amp;quot; for &amp;quot;Toyota Motors Corp.&amp;quot;), is a top priority initially in order to improve performance.</Paragraph>
    <Paragraph position="1"> Thus, we created in the Discourse Domain KB a discourse domain object called JV-Data-Extraction which specifies that two discourse phenomenon objects from the Discourse Phenomenon KB, namely mune anaphora (DP-Nmne) mid definite NP anaphora for orgmlizations (DP-DNP-Orgmlization), should be handled ill this application domain.</Paragraph>
    <Paragraph position="2"> NEW YORK -- A joint veature to export con from tile United States has been lbnned between M&amp;M Ferrous America Ltd. here and Crown Coal &amp; Coke Co., Pittsburgh.</Paragraph>
    <Paragraph position="3"> Coal obtained by Crown lroln v,'u-ious domestic mines will be marketed oflMlore by M&amp;M, a lrading colnp~my formed six years ago by former PhilippiBrothers Inc. employees. Crown, which formerly had its own mines, heretofore marketed coati from v,'uious sources to domestic steehnakers only, according to Eric S. Katzenstein, M&amp;M vice president.</Paragraph>
    <Section position="1" start_page="1109" end_page="1110" type="sub_section">
      <SectionTitle>
3.1 Name Anaphora
</SectionTitle>
      <Paragraph position="0"> In order to resolve name ~maphora, English mid Japanese share some of the KS's ill tile Discourse Knowledge Source KB, nmnely Current-Text-General01; Semmltic- null Type-Filter, and Recency-Orderer. Tiffs generator generates all the possible antecedenl hypotheses up to the current sentence. The Semantic-Type-Filter then checks if rice semantic type of amphor is consistent with that of an ;mtecedent \[iypothesis. When there is more than one hypothesis left, the Recency-Orderer orders the hypotheses according to their proximity to the ataphot.</Paragraph>
      <Paragraph position="1"> In addition to the three lmcguage-independent KS's, each h'mguage uses a language-specific lilter. For English, a filter named Englis\[i-N,'une-Filter, which matches an anaphor (e.g. &amp;quot;Crown&amp;quot;) with a subsequence of a~ mttecedent nane string (e.g. &amp;quot;Crown Coal &amp; Coke CO&amp;quot;), is currently employed. For Japatese, mt additional sittgle filler called Japanese-N,'une-Filter covers seemingly vast wu'iatious of Japanese company crane anaphora 2. This KS matches an attuphor with any conthiualion of characters in an ~mtecedenl as long as the character order is preserved (e.g. &amp;quot;abe&amp;quot; can be an anaphor of &amp;quot;abede&amp;quot;). One exceplion is lhal a~ mtaphor c~m have an extra word &amp;quot;s\[ia&amp;quot; at the end that is not a part of flte fnll company mune or a compmty acronym (e.g. &amp;quot;Westinghouse (WH)&amp;quot; can be refen'cd to auapltoric~dly by &amp;quot;Weslinghouse-sha&amp;quot; or &amp;quot;WH-sha&amp;quot;).</Paragraph>
    </Section>
    <Section position="2" start_page="1110" end_page="1110" type="sub_section">
      <SectionTitle>
3.2 Definite NP
</SectionTitle>
      <Paragraph position="0"> Attother discourse phenomctton which is handled lor this lask ix definite NPs relerriug Io organizations such as &amp;quot;the venture,&amp;quot; &amp;quot;the West Germ+m electronics concern,&amp;quot; etc., where the words &amp;quot;venture&amp;quot; and &amp;quot;cottcern&amp;quot; in these cotttexts point to subcltksses of I/to semanlic concept l+or an org~utization. Although Japanese does not have a delinite article, in writlen Japmlese the word &amp;quot;dou&amp;quot; (literally meaning &amp;quot;lice sane&amp;quot;) prefixed to certain nout~s performs approximately the sane function ~Ls English tlelinite a'tiele &amp;quot;the&amp;quot;. Both English and Japatese currently share the sane three KS's (i.e. Current-Text-Generatoc', Semantic-Type-Filter, Recency-Orderer) lot delinite NP resolution.</Paragraph>
      <Paragraph position="1"> Additionally, English uses Syntactic-Number-Filter, which checks if the syntaclic nnmber of the anaphor is consistent with that of ~m anlecedent hypolhesis. Although Japalese does not exhibit syntactic number distinction, a &amp;quot;don&amp;quot; phr~Lse can only refer semmttic~dly Io a single entity. 3 Thus, Japanese uses Semanlic-Amount-Eilter, which exchtdes semantically plural entities (e.g. a conjoined NP, ~m NP with a plural qnmttifier) as possible aatecedents for a &amp;quot;dou&amp;quot; phrase.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML