File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-3167_metho.xml
Size: 17,281 bytes
Last Modified: 2025-10-06 14:13:01
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-3167"> <Title>Recognizing Topics through the Use of Interaction Structures</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 A Topic Recognition Mechanism </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 What Isa Topic? </SectionTitle> <Paragraph position="0"> Topics are discourse referents shared by dialogue participants. They are things described by noun phrases and events described by verb phrases. However, these referents are topic candidates not topics. Those referents recognized as topics by persons must be shared by participants for a while; presented as topics again, or referred to by pronouns or zero pronouns.</Paragraph> <Paragraph position="1"> A set of utterances having topic coherence is called a &quot;topic segment&quot;. Topic structures consist of topic segments, topics in the segment and relations between the segments: nests or conjunctions. Post Office Dialogue in Fig. 1 can be segmented into topic segments as follows: (A-1 B-1 (A-2 B-2 A-3 B-3 A-4 B-4)) (A-5).</Paragraph> <Paragraph position="2"> ~gt~i~ (Mail delivery) is talked about from utterance A-1 to B-4, N~ (express) from A-2 to B-4 and so on.</Paragraph> <Paragraph position="3"> There are various types of relations between topics. In Post Office Dialogue in Fig.l, the topic &quot;NL~(express)&quot; in utterance A-2 is a subtopic of the topic &quot;J~9~idi~ (mail delivery)&quot; in A-1 because N~ is a subcategory of J (mail). In another example where a certain person Taro had moved to Kyoto recently, Kyoto may be a subtopic of Taro. Non-task-oriented dialogues may include various topic relations.</Paragraph> <Paragraph position="4"> Will the letter reach Kyoto by tomorrow? I think it will because the next letter collection is at noon.</Paragraph> <Paragraph position="5"> Can I drop the letter into that mailbox? B-4 t.t V,o Yes.</Paragraph> <Paragraph position="6"> The next question is about a postal</Paragraph> <Paragraph position="8"> TOPIC is a noun phrase marked by the postpositional particle &quot; ~ (wa)&quot;. In the sentence &quot; ~?. (Tokyo) t.t ~e) (Japanese) ~ (capital) -t:.</Paragraph> <Paragraph position="9"> -~- (is)&quot;, ~ (qbkyo) is TOPIC. EMPATHY includes the subject of mental verbs such as tg .~: (yorokobu, be glad), the source of ~< (iku, go), etc.. These verbs indicate the speaker's perspective. The subject markers include &quot;/,~ (gay', and the object markers&quot; ~ (wo)&quot;.</Paragraph> <Paragraph position="10"> These candidates can be used for topic markers. The candidate priority of topics is the same as that of focus; if TOPIC exists, it is a topic.</Paragraph> <Paragraph position="11"> If TOPIC does not exist but EMPATHY does, EMPATHY is a topic.</Paragraph> <Paragraph position="12"> Examples of Japanese clue words indicating a topic change am shown in Table 2. Corresponding English clue words are also shown.</Paragraph> <Paragraph position="14"> This variety of topic relations makes it difticult to identify topic relevance by domain knowledge prepared in advance. Thus, the weak point should be avoided by a new approach.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Topic Markers and Clue Words </SectionTitle> <Paragraph position="0"> There are many topic marker expressions in Japanese. For example, expressions in Table 1 indicate topics explicitly. English expressions such as &quot;concerning ...&quot; and &quot;as regards ...&quot; are similar to these expressions.</Paragraph> <Paragraph position="1"> Japanese expression pronunciation TOPIC ~l g-c (ni kanshi te) TOPIC ~ow-c (ni tsuite) TOPIC ~ v, 5 C/~t~t (to iu no wa) TOPIC ti (wa) &quot;TOPIC&quot; means an indicated topic.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 A ~lbpicStack </SectionTitle> <Paragraph position="0"> A stack is used to handle discourse segments in the discourse model by BJ.Grosz\[2\]. A stack element corresponds to a segment, and is called a &quot;focus space&quot;. Discourse entities such as objects are maintained in focus spaces. The top stack element holds the most salient entities. Discourse segment structures are related to the intentional structure. The &quot;dominance&quot; and &quot;satisfaction precedence&quot; relation between intentions decides pushing and popping of focus spaces.</Paragraph> <Paragraph position="1"> A &quot;topic segment&quot; is a discourse segment of large size, and &quot;topic stack&quot; is used to handle topics. However, pushing and popping of topics cannot be determined by the intentional structure in ore approach because both topic-oriented and non-topic-oriented dialogues are treated, and the intentional structure may be ill-formed.</Paragraph> <Paragraph position="2"> Instead of the intentional structure, only clue words are allowed to determine the pushing AcrEs DE COL1NG-92. NANTES, 23-28 Aotrr 1992 1 0 6 5 PROC. OF COLING-92, NAN'rE.s, AUG. 23-28, 1992 and popping. For example, &quot;~ ~g4--~: (first)&quot; indicates pushing, and &quot;0~: (next)&quot; popping.</Paragraph> <Paragraph position="3"> To recognize local topic structures, a simple mechanism is used. Each element of a topic stack is treated as a stack called an &quot;inner stack&quot;. Topics are pushed onto the inner stack. If an explicit topic indicated by makers in Table.1 is recognized, non-explicit topics are popped from the stack.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Identifying Topic Continuation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 The Basic Idea </SectionTitle> <Paragraph position="0"> In dialogues, topics can be changed naturally at some utterances, but not at others. For example, topics unfold naturally in the dialogue in Fig. 1.</Paragraph> <Paragraph position="1"> On the other hand, topic expansion is not natural in the dialogue in Fig. 2.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Topic Expansion and Speech Acts </SectionTitle> <Paragraph position="0"> The unnatural topic expansion in Fig.2 is related to speech act purposes called illocutionary points. Classification of the illocutionary points was proposed by J.R.Searle\[3\]: The &quot;assertive point&quot; tells how the world is, e.g. to state and to predict. The &quot;commissive point&quot; commits the speaker to doing something. A promise is an example. The &quot;directive point&quot; tries to have the hearer do things. Making a request is an example. The &quot;declarative point&quot; changes the world by saying so, e.g, to declare and to name.</Paragraph> <Paragraph position="1"> The &quot;expressive point&quot; expresses the speaker's feeling, e.g. to apologize.</Paragraph> <Paragraph position="2"> A hypothesis is built: ira current utterance follows a directive utterance, the current topic is relevant to the topic in the directive utterance.</Paragraph> <Paragraph position="3"> This is called &quot;topic forwarding&quot;. The unnatural topic expansion in Fig.2 can be explained by this hypothesis. The topic of utterance Q-1 must be relevant to one topic of P-1 because the utterance P-1 is directive. However, &quot;0,:~: (next)&quot; indicates a topic change. This contradiction causes unnatural topic expansion.</Paragraph> <Paragraph position="4"> Utterance pairs such as &quot;requesting - accepting&quot; and &quot;asking - informing&quot; will retain a topic even if the pairs are nested. For example, in the following, R-1 - S-2 have the topic of &quot;restaurant&quot; and S-I and R-2 have the topic of &quot;money for restaurant&quot;.</Paragraph> <Paragraph position="5"> R- 1 Do you know a good restaurant? S-1 How much money do you have? R-2 My salary is low.</Paragraph> <Paragraph position="6"> S-2 That reshaurant is cheap and good.</Paragraph> <Paragraph position="7"> However, pairs are not always so formed. In Post Office Dialogue in Fig.l, utterance A-3 performs two speech acts: informing-if and asking. Deeper dialogue understanding is needed for correct pair identification. Therefore, in this work, the pairs are not identified and a directive utterance is regarded as forwarding a topic only to the next utterance.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Utterance Types </SectionTitle> <Paragraph position="0"> &quot;Topic forwarding&quot; classifies utterances into two types: topic-binding and topic-nonbinding utterances. Topic-binding utterances have the directive point but topic-nonbinding ones do not. Topic-binding utterance speech acts include to ask, to request and to confirm. Topicnonbinding utterance speech acts include to inform and to acknowledge.</Paragraph> <Paragraph position="1"> In Japanese, the utterance type can be identified by pattern matching with expressions such as those shown in Table 3 and 4.</Paragraph> <Paragraph position="2"> ACRES DE COLING-92, NANTES, 23-28 AOUT 1992 1 0 6 6 PROC. OF COLING-92. NANTES. AUO. 23-28. 1992</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.4 ~lbpic Recognition </SectionTitle> <Paragraph position="0"> A set of consecutive utterances in which the same topics continue is called a &quot;topic unit&quot;. A topic unit can be identitied by using &quot;topic forwarding&quot; instead of domain knowledge: 1. The current utterance belongs to the same topic unit as tire previous utterance if the previous utterance is topic-binding, or there is no topic candidate indicated by a topic marker ill tile current ntterance.</Paragraph> <Paragraph position="1"> 2. Otherwise, a new topic unit is created.</Paragraph> <Paragraph position="2"> The unit is used to validate candidate of topics and topic changes, and has no clTect on tile topic and the inner stack.</Paragraph> <Paragraph position="3"> Noun phrases indicated by topic markers arc regarded as topic candidates, and utterances with clue words are detected as topic change candidates. Some of them arc recognized as topics. Topic cmldidates are preserved in a &quot;candidate list&quot;. Recognized topics arc pushed onto the inner stack of the topic stack described in 2.3. Topics can be identitied by using tire topic unit: a) A topic candidate indicated by a topic marker such as those listed in Table 1 is inamediately recognized as a topic, and pushed onto the inner stack. This is because such markers indicate topics explicitly. These markers are called &quot;explicit topic markers&quot;, and the topics &quot;explicit topics&quot;.</Paragraph> <Paragraph position="4"> b) A topic candidate indicated by other inarkcrs such as&quot;/A (ga)&quot; and&quot; ,? (we)&quot; is l/reserved in tire candidate list. It is recognized as a topic only when tile candidate continues for n utterances. If recognized as a topic, it is removed from the candidate list, and pushed onto the inner stack. The optimum value of n is 4 according to the results of a manual topic recognition experiment.</Paragraph> <Paragraph position="5"> c) lfa new topic unit is generated, the candidate list is reset to an empty list.</Paragraph> <Paragraph position="6"> d) A topic change candidate is recognized as a topic change only when the candidate is in the lirst utterance in a topic unit.</Paragraph> <Paragraph position="7"> Ira topic change is recognized, the candidate list is reset to an empty list and the immr stack is pushed onto or popfmd flom the topic stack according to clue words.</Paragraph> <Paragraph position="8"> This topic recognition algorithm can be used for any language because &quot;topic forwarding&quot; is not language-specific. Only dictionaries for the topic markers, the clue words and ttle utterance type identification am unique for each language.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.5 All Example of rlbpic Recognition </SectionTitle> <Paragraph position="0"> In utterance Aol in Post Office Dialogue in Fig. 1, &quot;~{~C/.~t~ (yuubin no haitatsu, mail delivery)&quot; is identified as a topic candidate by the topic marker &quot; m~v,c (ni tsuite)&quot;. This candidate is immediately recognized as a topic because of the explicit marker. Utterance A-1 and B-1 belong to the same topic trait because B-I has no topic candidate. The system state after processing t1-1 is the following. Each element of the topic stack is a inner stack. The right-most element of thc topic and the inner stack is the top stack</Paragraph> <Paragraph position="2"> From utterance A-2 to B-3, a topic marker &quot; ~ (we)&quot; is detected in A~2 and B-3, and &quot; ~c (ni)&quot; in A-3. Therefore, &quot;Ni~ (sokutatsu, express)&quot; in A-2, &quot;gg (Kyoto)&quot; in A-3 and&quot; {~tfs (yuubin butsu, mail)&quot; in B-3 are identified as topic candidates. Ftuthermom, B-3 is detected as a topic change candidate because of the clue word &quot;?km (tsugi ni, next)&quot;. A-2 generates a new topic unit because B-1 is topic-nonbinding and there is a topic candidate in A-2. As a rcsuh of the unit generation, the candidate list is reset. Utterances from 11-2 to B-3 belong to tire second topic unit. This is because there is no topic candidate in B-2, and B-2 and A-3 are topic-binding. Therefore, the candidate &quot;Ni~&quot; continues for 4 utterances in tire second topic unit and is recognized as a topic. The topic change candidate in B-3 is dismissed correctly because it is not in the first utterance in the topic unit. The system state after processing B-3 is: (;andidat.cList = {~,N,~}.</Paragraph> <Paragraph position="3"> Utterance A~4 generates a new topic unit and the candidate list is reset to an empty set. In A-4, &quot; ,~ x ~ (posuto, a mailbox)&quot; is detected as a topic candidate. B-4 belongs to tire unit. The state of the inner stack does not change.</Paragraph> <Paragraph position="4"> ACTES DE COI.JNG-92, NANfES, 23-28 AOtn&quot; 1992 l 06 7 PRec. OF COLlNG-92, NAm'~S, AUG. 23-28, 1992 In utterance A-5, a topic candidate &quot;~1~.</Paragraph> <Paragraph position="5"> (yuubin chokin, a postal deposit)&quot; is identified.</Paragraph> <Paragraph position="6"> A-5 is detected as a topic change candidate because of the clue word &quot;0~tc (tsugini, next)&quot;. The change candidate is recognized as a topic change correctly because A-5 is the first utterance of a new topic unit. As a result, the inner stack is popped from the topic stack . The system state after processing A-5 is: TopicSt,,.~, : \[\[ \]J</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Discussion </SectionTitle> <Paragraph position="0"> The results from a topic recognition experiment using 207 utterances taken from dialogue transcripts is shown in Table 5. Topics recognized by our system are compared with the manually recognized topics.</Paragraph> <Paragraph position="1"> Recognition and dismissal of topic change candidates was peffomaed correctly. This correctness has the beneficial effect that wrong popping of the topic stack and the reset of the candidate list can be avoided.</Paragraph> <Paragraph position="2"> 2 noun phrases were wrongly recognized as topics by the system. These errors occurred when current topic T-1 returned to past topic T-2, and T-2 was not described explicitly at that time. Although a topic change has occurred, T-1 is regarded as a current topic because no topic candidate was presented.</Paragraph> <Paragraph position="3"> 3 topics were not recognized as topics but were wrongly dismissed. This error occurred when the current topic was rephrased; &quot;topic forwarding&quot; fails in this case. Synonyms such as a fridge and a refrigerator are often used.</Paragraph> <Paragraph position="4"> Topic recognition accuracy is sufficient for a topic-oriented video retrieval support system. The recognition method is effective especially in dialogues with interaction structures such as &quot;asking- asking&quot; and &quot;requesting - asking&quot;. The experimental results show that such structures are included in many dialogues. Mixed-initiative dialogues may lbrm the structures. To improve topic recognition accuracy, other approaches such as a knowledge-based approach can be added. For example, a synonym list and a thesaurus would contribute to topic continuation identification.</Paragraph> </Section> class="xml-element"></Paper>