File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1060_metho.xml
Size: 6,034 bytes
Last Modified: 2025-10-06 14:11:51
<?xml version="1.0" standalone="yes"?> <Paper uid="C86-1060"> <Title>Linguistic Knowledge Extraction from Real Language Behavior</Title> <Section position="3" start_page="253" end_page="254" type="metho"> <SectionTitle> 3. Sentence Analysis </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="253" end_page="254" type="sub_section"> <SectionTitle> 3.1. Sentence Analysis System ESSAY </SectionTitle> <Paragraph position="0"> We made ESSAY (Experimental System of Sentence Analysis) which analyzes the dependency structure using the knowledge base. We show the outline of this system in Fig.2.</Paragraph> <Paragraph position="1"> Using the knowledge base, ESSAY analyzes the dependency structure of sentences, if those patterns are used 3ust as they were obtained :from the text data, they can only cover the relations which have appeared in the text data. But the clustering process allows the system to cover more relations than appeared in the text data.</Paragraph> <Paragraph position="3"/> </Section> <Section position="2" start_page="254" end_page="254" type="sub_section"> <SectionTitle> 3.2. An Experiment </SectionTitle> <Paragraph position="0"> We made an experiment of sentence analysis with ESSAY. The knowledge base was organized from the 4i78 sentences of text data quoted from computer manuals. The input sentences we provided for the test were not contained in the sentences used for knowledge base organization. A sample of the analysis result is shown in Fig.3. There is a possibility that a Bunsetu (a kind of phrase structure element) has several ways of possible division into words and Euzokuqo. The system tests some combinations of those divisions. In this figure, EVAL POINT indicates the value evaluated for each structure that is calculated from the likelihood of each relations constructing the structure, we can express the conclusion as follows:</Paragraph> <Paragraph position="2"/> </Section> <Section position="3" start_page="254" end_page="254" type="sub_section"> <SectionTitle> Sentence Length (Nulber of Bunsetsu) </SectionTitle> <Paragraph position="0"> $1: The experiment us done under tt~o conditions ustn8 and ulthout using Fuzoku-$o for Inalysis in order to exulne the effect of Fuzoku-go.</Paragraph> <Paragraph position="1"> $2: Tim rate at uhlch the analysts succeeds.</Paragraph> <Paragraph position="2"> $3: The order of correct candidate In the a~lysis results. $&: The rate at vhlch the correct candidate is ranked first. Fig.4 Analysis Results of every Sentence Length a) There is a problem that the long sentence with many Bunsetu often makes too many combinations of relation candidates.</Paragraph> <Paragraph position="3"> b) There are some cases that no result is obtained because only a part of words does not have a relation candidates although all of others have the correct relations, c) It is difficult to describe a parallel relation using relations between two words.</Paragraph> <Paragraph position="4"> Therefore, it is difficult to analyze a sentence containing parallel relations.</Paragraph> <Paragraph position="5"> d) The rate at which the analysis succeeds depends on the length of the sentence. As the sentence becomes longer, the rate becomes lower. The average of the rate was about 40 per cent.</Paragraph> <Paragraph position="6"> This result is shown in FIg.4.</Paragraph> </Section> </Section> <Section position="4" start_page="254" end_page="254" type="metho"> <SectionTitle> 4. More ComplicaLed Data Structure </SectionTitle> <Paragraph position="0"> ESSAY decides the relations according to the connection only between two words.</Paragraph> <Paragraph position="1"> The other parts of the sentence take no role in this decision at all. But the relations complicatedly interact to one another in actual sentences. In this section, we describe how to deal with the interaction of the relations to provide a wider ground for judging propriety of relations.</Paragraph> <Paragraph position="2"> (he) (to school) (by bus) (goes) There are word~ relating to more than two other words at the same time. As shown in Fig.5(a), four kinds of relations appear in the text data. If more than two kinds of relations appear at the same time, the frequency of relations are counted. Then frequency table is expressed by a matrix called relation matrix shown in Fig.5tb).</Paragraph> <Paragraph position="3"> The element Mii means frequency of Ri itself, and tim element Mij means frequency of appearance of both Ri and RJ at the same time. This matrix is obtained for each word that have been reIated with more than two words at the same time. Utilizing this matrix, we can get wider ground for .judging propriety of relations. When the relation &quot;go -(to1- school&quot; is obvious, seeing element M2i and Mi2 (i#2) of the matrix, we can gel probability of each relation Ri in this situation.</Paragraph> <Paragraph position="4"> 4.2. Effect of the relation Matrix Using this matrix, the ground for judging propriety of the relations becomes wider' and the number of candidates can be effectively redticed. Secondly, because each relation becomes more reliable, it is expected to get relations according to the sentence meaning.</Paragraph> </Section> <Section position="5" start_page="254" end_page="254" type="metho"> <SectionTitle> 5. Conclusion </SectionTitle> <Paragraph position="0"> We haw?. introduced a bottom tip approach of organization for a linguistic knowledge base. For the organization of knowledge bas e, con t i nuous human ef for t has been required. The vocabulary of the knowledge base depends on the quantity of text data.</Paragraph> <Paragraph position="1"> l,inguistic knowledge base organized in this manner may not be .so powerful as tho.~e constructed analylically. But such method may open an automatic w~iy of the knowledge acquisition and there may be a possibil ty to discover rules and properties which we have never noticed.</Paragraph> </Section> class="xml-element"></Paper>