File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-2085_metho.xml
Size: 6,083 bytes
Last Modified: 2025-10-06 14:13:03
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-2085"> <Title>Linguistic Knowledge Generator</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Experiment </SectionTitle> <Paragraph position="0"> We conducted an experiment using compound nouns from a computer manual according to the scenario.</Paragraph> <Paragraph position="1"> The resnlt for other relations, for example prepositional attachment, would be not so differeut from this result.</Paragraph> <Paragraph position="2"> The corpus consisted of 8304 sentences. As the result of Japanese tagging program, 1881 candidates, 616 kinds of compound nouns were extracted.</Paragraph> <Paragraph position="3"> Then ALPSC took these compound nouns as an input. Tuple relations were supposed between all words of all compound nouns with the syntactic relation 'MODIFY'. A tuple has to have a preceding argument and a following head For example, from a compound noun with 4 words, 5 ambiguous tuples and 1 firm tupie can be extracted, because each element can be the argument m only one tuple. An initial credit of 1/3 was set for each instance-tuple whose arguments are the first word of the compound noun. Similarly, a credit i/2 was set for each instance-tuple in which the second word is an argument.</Paragraph> <Paragraph position="4"> No word distance information was introduced in the first trial. Then the learning process was started. We have shown the results of first trial in Table 1 and examples m Figure 2 The results were classified as correct or incorrect etc.. 'Correct' means that a hypotbesis-tuple which has the highest plausibility value is the correct tupie within ambiguous tuples. 'Incorrect' means it isn't. 'ludefinite' means that plausibility values of some hypothesis-tuples have the same value. 'Uncertain' means that it is impossible to declare which hypothesis tuple is the best without context.</Paragraph> <Paragraph position="5"> j 4 tl 41 \[ r/ s r 1 I I 5 II 4 I o I o I 2 I Table I: Results of experiment after first ALPSC Tile clustering program produced 44 clusters based on the word distance data. a sample of the clusters is shown in Figure 3. The average number of words in a cluster was 3.43, Each produced cluster contained one to twenty five words. This is good number to treat manually. The human intervention to extract correct clusters resulted in 26 clusters being selected from 44 produced clusters. The average number of ACRES DE COLING-92, NANTES, 23-28 Aot'rr 1992 5 6 4 Paoc. oi: COLING-92, NAYrES, AUO. 23-28, 1992 words in a cluster is 2.96, It took a linguist who is familiar with computer 15 minutes. A sample of the selected clusters is shown in Figure 4.</Paragraph> <Paragraph position="6"> These clusters were used for the second trial of ALPSC, The results of second trial are shown in Table 2.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> The scenario described above embodies a part of our ideas. Several other experiments have already been conducted, based on other scenarios such a.q a see nario for finding clusters of nouns by which we can resolve ambiguities caused by prepositional attachment in English. Though this works in a similar fashion as the one we discussed, it has to treat more serf ous structural ambiguities and store diverse syntactic structures.</Paragraph> <Paragraph position="1"> Though we have not compared them in detail, it call be expected that the organization of semantic clusters of nouns tbat emerge in these two scenarios will be different from each other. One reflects collocational relations among nouns, while the other reflects tlmse between nouns and verbs. By merging these two scenarios into one larger scenario, we may be able to obtain more accurate or intuitively reasonable notnr clusters. We are planning to accumulate a number of SUCh scenarios and larger scenarios. We hope we can report it. soon.</Paragraph> <Paragraph position="2"> As t\)r tim result of the particular experiment in the previous section, one eighth of the incorrect results have progressed after one trial of the gradual approxiruatiou. This is significant progress in the processing. For humans it wmdd be a tremendously laborious job as they would be required to examine all the results.</Paragraph> <Paragraph position="3"> What bunlans did in the experiment is simply divide the produced clasters.</Paragraph> <Paragraph position="4"> Although the clusters are produced by a nooverlapping clustering algorithm in this experiment, we are developing an overlapping clustering program.</Paragraph> <Paragraph position="5"> l\[opefulty it will produce clusters which involve the concept of word sense ambiguity. It will mean that a word can belong to several clusters at a time. 3&quot;he method to produce overlapping clusters is one of our current research topics.</Paragraph> <Paragraph position="6"> Examining the results, we can say that the cluster effect is not enough to explain word relatious of compound nouns, q'here might be some structural and syntactic restrictions. This feature of compound nouns made it hard to get a higher percentage of correct answers in our experiment. Extra-processing to address these problems can be introduced into our system.</Paragraph> <Paragraph position="7"> Because tile process concerns huge amount of linguistic data which also has ambiguity, it is inevitable to be experimental. A sort of repetitive progress is needed to make the system smarter. We will need to perform a lot of experiments in order to determine the type of the human intervention required, as there seems to be no means of deternfining this theoretically. null This system is aiming not to sinmlate human linguists who conveutionally have derived linguistic knowledge by computer, but to discover a new paradigm where automatic knowledge acquisition prograrns and human effort are effectively combined to generate linguistic knowledge.</Paragraph> </Section> class="xml-element"></Paper>