File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/p84-1049_metho.xml
Size: 9,075 bytes
Last Modified: 2025-10-06 14:11:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P84-1049"> <Title>REFERENCE \[i\]. Yiming Yang: A Study of a System for Analyzing Chinese</Title> <Section position="4" start_page="0" end_page="222" type="metho"> <SectionTitle> 2. REDUCING AMBIGUITIES USING CHARACTERISTIC WORDS </SectionTitle> <Paragraph position="0"> In the Chinese language, there is a kind of word (such as preposition, auxiliary verb, modifier verb, adverbial noun, etc..), that is used as an independant word (not an affix). They usually have key functions, they are not so numerous, their use is very frequent, and so they may be used to reduce anbiguities. Here we shall call them &quot;characteristic words&quot;.</Paragraph> <Paragraph position="1"> Several hundreds of these words have been collected by linguists\[2\],and they are often used to distinguish the detailed meaning in each part of a Chinese sentence. Here we selected about 200 such words, and we use them to try to pick out fragments of the sentence and figure out their syntactic structure before we attempt global syntactic analysis and deep meaning analysis.</Paragraph> <Paragraph position="2"> The use of the characteristic words is described below.</Paragraph> <Paragraph position="3"> a) Category decision: Some characteristic words may serve to decide the category of neighboring words. For example, words such as &quot;~ &quot;, &quot;~&quot;, &quot;~&quot;, &quot;4~&quot;, are rather like verb postfixes, indicating that the preceding word must be a verb, even though the same characters might spell a noun. Words like &quot; ~ &quot;, &quot; ~ &quot;, can be used as both verb and auxiliary. If, for example, &quot;~ &quot; is followed by a word that could be read as either a verb or a noun, then this word is a verb and &quot;~ &quot; is an auxiliary.</Paragraph> <Paragraph position="4"> b) Fragment picking In Chinese, many prepositional phrases start</Paragraph> <Paragraph position="6"> The ball must run a longer distance before returning to the initial altitude on this slope.</Paragraph> <Paragraph position="7"> distinguish a word fremothers characteristical word fragment verb Or adjective the word can not he predicate of sentence Fig.iAn Example of Fragment Finding with a preposition such as &quot;~&quot;, &quot;~&quot;, &quot;~&quot;, and finish on a characteristic word belonging to a subset of adverbial nouns that are often used to express position, direction, etc.. When such characteristic words are spotted in a sentence, they serve to forecast a prepositional phrase. Another example is the pattern &quot;...{ ... ~&quot;, used a little like &quot;... is to ...&quot; in English, so when we find it, we may predict a verbal phrase from &quot;~ &quot; to &quot;%.~&quot;, that is in addition the predicate VP of the sentence.</Paragraph> <Paragraph position="8"> These forecasts make it more likely for the subsequent analysis system to find the correct phrase early.</Paragraph> <Paragraph position="9"> c) Role deciding The preceding rules are rather simple rules like a human might use. With a cxmputer it is possible to use more ~lex rules (such as involving many exceptions or providing partial knowledge) with the same efficiency. For example, a rule can not usually with certainty decide if a given verb is the predicate of a sentence, but we know that a predicate is not likely to precede a characteristic word such as &quot;~9 &quot; or &quot; { &quot; or follow a word like &quot;~-~&quot;, &quot;~&quot; or &quot;~&quot;. We use this kind of rule to reduce the range of possible predicates. This knowledge can be used in turn to predict the partial structure in a sentence, because the verbal proposition begins with the predicate and ends at the end of the sentence.</Paragraph> <Paragraph position="10"> In the example shown in Fig.l, fragments f3 and f4 are obtained through step (a) (see above), fl through (b), and f2 and f5 through (c). The symbol &quot;o&quot; shows a possible predicate, and &quot;x&quot; means that the possibility has been ruled out.</Paragraph> <Paragraph position="11"> Out of 7 possibilities, only 2 remained.</Paragraph> </Section> <Section position="5" start_page="222" end_page="222" type="metho"> <SectionTitle> 3. RESOLVING CONFLICT </SectionTitle> <Paragraph position="0"> The rules we mentioned above are written for each characteristic word independantly. They are not absolute rules, so when they are applied to a sentence, several fragments may overlap and thus be incrmpatible. Several crmabinations of compatible fragments my exist, and frcm these we must choose the most &quot;likely&quot; one. Instead of attempting to evaluate the likelihood of every combination, we use a scheme that gives different priority scores to each fragment, and thus constructs directly the &quot;hest&quot; combination. If this combination (partial structure) is rejected by subsequent analysis, back-tracking occurs and searches for the next possibility, and so on.</Paragraph> <Paragraph position="1"> Fig.2 shows an example involving conflicting fragments. We select f3 first because it has the highest priority. We find that f2 , f4 and f5 collide with f3, so only fl is then selected next.</Paragraph> <Paragraph position="2"> The resulting combination (fl,f3) is correct.</Paragraph> <Paragraph position="3"> Fig.3 shows the parsing result obtained by computer in our preprocessing subsystem.</Paragraph> </Section> <Section position="6" start_page="222" end_page="224" type="metho"> <SectionTitle> 4. PRIORITY </SectionTitle> <Paragraph position="0"> In the preprocessing, we determine all the possible fragments that might occur in the sentence and involving the characteristic words.</Paragraph> <Paragraph position="1"> Then we give each one a measure of priority. This measure is a complex function, determined largely by trial and error. It is calculated by the following principles: a) Kind of fragment Some kinds of fragments, for example, compound verbs involving &quot;~&quot;, occur more often than others and are accordingly given higher priority</Paragraph> <Paragraph position="3"> : In the perfect situation -without friction the object will keep moving with constant speed.</Paragraph> <Paragraph position="4"> : pattern of fragment : a word which is either a verb or a noun</Paragraph> <Paragraph position="6"> : In the perfect situation without friction the object will keep moving with constant speed.</Paragraph> <Paragraph position="7"> : fragment obtained by preprocessing subsystem : the names of fragments shown in Fig. 2 : the omitted part of the resultant structure tree We call &quot;precise&quot; a pattern that contains recognizable characteristic words or subpatterns, and imprecise a pattern that contains words we cannot recognize at this stage. For example, f3 of Fig.2 is more precise than fl, f2 or f4. We put the more precise patterns on a higher priority level.</Paragraph> <Paragraph position="8"> c) Fragment length Length is a useful parameter, but its effect on priority depends on the kind of fragment. Accordingly, a longer fragment gets higher priority in some cases, lower priority in other cases.</Paragraph> <Paragraph position="9"> The actual rules are rather complex to state explicitly. At present we use 7 levels of priority.</Paragraph> <Paragraph position="10"> tried the method on a set of mere complex sentences. From the same textbook, out of 800 sentences containing prepositional phrases, 80 contained conflicts, involving 209 phrases. Of these conflicts, in our test 83% ware resolved at first choice, 90% at second choice, 98% at third choice.</Paragraph> </Section> <Section position="7" start_page="224" end_page="224" type="metho"> <SectionTitle> 6. SUMMARY </SectionTitle> <Paragraph position="0"> In this paper, we outlined a preprocessing technique for Chinese language analysis.</Paragraph> <Paragraph position="1"> Heuristic knowledge rules involving a limited set of characteristic words are used to forecast partial syntactic structure of sentences before global analysis, thus restricting the path through the search space in syntactic analysis.</Paragraph> <Paragraph position="2"> Comparative processing using knowledge about priority is introduced to resolve fragment conflict, and so we can obtain the correct result as early as possible.</Paragraph> <Paragraph position="3"> In conclusion, we expect this scheme to be useful for efficient analysis of a language such as Chinese that contains a lot of syntactic ambiguities.</Paragraph> </Section> <Section position="8" start_page="224" end_page="224" type="metho"> <SectionTitle> ACKNOWLEDGMENTS </SectionTitle> <Paragraph position="0"> We wish to thank the members of our laboratory for their help and fruitful discussions, and Dr. Alain de Cheveigne for help with the English.</Paragraph> </Section> class="xml-element"></Paper>