XML Viewer - w03-1717

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1717_metho.xml
Size: 15,266 bytes
Last Modified: 2025-10-06 14:08:36
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1717">
  <Title>Learning Verb-Noun Relations to Improve Parsing</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The Learning Procedure
</SectionTitle>
    <Paragraph position="0"> The syntactic ambiguity associated with the verb-noun sequence can be either local or global. The kind of ambiguity we have observed in (1) and (2) is global in nature, which exists even if this noun phrase is plugged into a larger structure or complete sentence. There are also local ambiguities where the ambiguity disappears once the verb-noun sequence is put into a broader context. In the following examples, the sentences in (3) and (4) can only receive the analyses in Figure 5 and Figure 6 respectively.</Paragraph>
    <Paragraph position="1">  (3) Zhe Shi Xin De Deng Ji Shou Xu .</Paragraph>
    <Paragraph position="2"> zhe shi xin de dengji shouxu this be new DE register procedure &amp;quot;This is a new registration procedure.&amp;quot; (4) Ni Bu Bi Ban Li Shou Xu .</Paragraph>
    <Paragraph position="3"> ni bu bi banli shouxu you not must handle procedure &amp;quot;You don't have to go through the procedure.&amp;quot;  In the processing of a large corpus, sentences with global ambiguities only have a random chance of being analyzed correctly, but sentences with local ambiguities can often receive correct analyses. Although local ambiguities will create some confusion in the parsing process, increase the size of the parsing chart, and slow down processing, they can be resolved in the end unless we run out of resources (in terms of time and space) before the analysis is complete. Therefore, there should be sufficient number of cases in the corpus where the relationship between the verb and the noun is clear. An obvious strategy we can adopt here is to learn from the clear cases and use the learned knowledge to help resolve the unclear cases. If a verb-noun pair appears predominantly in the verb-object relationship or the modifier head relationship throughout the corpus, we should prefer this relationship everywhere else.</Paragraph>
    <Paragraph position="4"> A simple way to learn such knowledge is by using a tree-filter to collect all instances of each verb-noun pair in the parse trees of a corpus, counting the number of times they appear in each relationship, and then comparing their frequencies to decide which relationship is the predominant one for a given pair. Once we have the information that &amp;quot;Deng Ji &amp;quot; is typically a modifier of &amp;quot;Shou Xu &amp;quot; and &amp;quot;Ban Li &amp;quot; typically takes &amp;quot;Shou Xu &amp;quot; as an object, for instance, the sentence in (1) will only receive the analysis in Figure 1 and (2) only the analysis in Figure 2. However, this only works in idealized situations where the parser is doing an almost perfect job, in which case no learning would be necessary. In reality, the parse trees are not always reliable and the relations extracted from the parses can contain a fair amount of noise. It is not hard to imagine that a certain verb-noun pair may occur only a couple of times in the corpus and they are misanalyzed in every instance. If such noise is not filtered out, the knowledge we acquire will mislead us and minimize the benefit we get from this approach. null An obvious solution to this problem is to ignore all the low frequency pairs and keep the high frequency ones only, as wrong analyses tend to be random. But the cut-off point is difficult to set if we are only looking at the raw frequencies, whose range is hard to predict. The cut-off point will be too low for some pairs and too high for others. We need a normalizing factor to turn the raw frequencies into relative frequencies. Instead of asking &amp;quot;which relation is more frequent for a given pair?&amp;quot;, the question should be &amp;quot;of all the instances of a given verb-noun pair in the corpus, which relation has a higher percentage of occurrence?&amp;quot;. The normalizing factor should then be the total count of a verb-noun pair in the corpus regardless of the syntactic relations between them. The normalized frequency of a relation for a given pair is thus the number of times this pair is assigned this relation in the parses divided by this normalizing factor.</Paragraph>
    <Paragraph position="5"> For example, if Deng Ji Shou Xu occurs 10 times in the corpus and is analyzed as verb-object 3 times and modifier-head 7 times, the normalized frequencies for these two relations will be 30% and 70% respectively. What we have now is actually the probability of a given pair occurring in a given relationship. This probability may not be very accurate, given the fact that the parse trees are not always correct, but it should a good approximation, assuming that the corpus is large enough and most of the potential ambiguities in the corpus are local rather than global in nature.</Paragraph>
    <Paragraph position="6"> But how do we count the number of verb-noun pairs in a corpus? A simple bigram count will unjustly favor the modifier-head relation. While the verb and the noun are usually adjacent when the verb modifies the noun, they can be far apart when the noun is the object of the verb, as illustrated in (5).</Paragraph>
    <Paragraph position="7">  (5) Ta Men Zheng Zai Ban Ban Ban Ban Li Li Li Li Qu Tai Wan Can Jia tamen zhengzai banli qu taiwan canjia they PROG handle go Taiwan participate  To get a true normalizing factor, we must count all the potential dependencies, both local and longdistance. This is required also because the tree-filter we use to collect pair relations consider both local and long-distance dependencies as well.</Paragraph>
    <Paragraph position="8"> Since simple string matching is not able to get the potential long-distance pairs, we resorted to the use of a chart-filter. As the parser we use is a chart parser, all the potential constituents are stored in the chart, though only a small subset of those will end up in the parse tree. Among the constituents created in the chart for the sentence in (5), for instance, we are supposed to find [Ban Li ] and [Qu Tai Wan Can Jia Di Shi Jiu Jie Guo Ji Ji Suan Yu Yan Xue Hui Yi De Shou Xu ] which are adjacent to each other. The fact that Shou Xu is the head of the second phrase then makesShou Xu adjacent to Ban Li . We will therefore be able to get one count of Ban Li followed by Shou Xu from (5) despite the long span of intervening words between them. The use of the chart-filter thus enables us to make our normalizing factor more accurate. The probability of a given verb-noun pair occurring in a given relation is now the total count of this relation in the parse trees throughout the corpus divided by the total count of all the potential relations found in all the charts created during the processing of this corpus.</Paragraph>
    <Paragraph position="9"> The cut-off point we finally used is 50%, i.e. a pair+relation will be kept in our knowledge base if the probability obtained this way is more than 50%. This may seem low, but it is higher than we think considering the fact that verb-object and modifier-head are not the only relations that can hold between a verb and a noun. In (6), for example, Ban Li is not related toShou Xu in either way in spite of their adjacency.</Paragraph>
    <Paragraph position="10">  (6) Ta Men Qu Shang Hai Ban Li Shou Xu Suo Xu tamen qu shanghai banli shouxu suoxu they go Shanghai handle procedure need  &amp;quot;They went to Shanghai to handle the notarized material needed for the procedure.&amp;quot; We will still find the Ban Li Shou Xu pair in the chart, but it is not expected to appear in either the verb-object relation or modifier-head relation in the parse tree. Therefore, the baseline probability for any pair+relation might be far below 50% and more than 50% is a good indicator that a given pair does typically occur in a given relation. We can also choose to keep all the pairs with their probabilities in the knowledge base and let the probabilities be integrated into the probability of the complete parse tree at the time of parse ranking.</Paragraph>
    <Paragraph position="11"> The results we obtained from the above procedure are quite clean, in the sense that most of the pairs that are classified into the two types of relations with a probability greater than 50% are correct. Here are some sample pairs that we learned.  However, there are pairs that are correct but not &amp;quot;typical&amp;quot; enough, especially in the verb-object relations. Here are some examples:  ...</Paragraph>
    <Paragraph position="12"> These are truly verb-object relations, but we may not want to keep them in our knowledge base for the following reasons. First of all, the verbs in such cases usually can take a wide range of objects and the strength of association between the verb and the object is weak. In other words, the objects are not &amp;quot;typical&amp;quot;. Secondly, those verbs tend not to occur in the modifier-head relation with a following noun and we gain very little in terms of disambiguation by storing those pairs in the knowledge base. To prune away those pairs, we used the log-likelihood-ratio algorithm (Dunning, 1993) to compute the degree of association between the verb and the noun in each pair. Pairs where there is high &amp;quot;mutual information&amp;quot; between the verb and noun would receive higher scores while pairs where the verb can co-occur with many different nouns would receive lower scores. Pairs with association scores below a certain threshold were then thrown out. This not only makes the remaining pairs more &amp;quot;typical&amp;quot; but helps to clean out more garbage. The resulting knowledge base therefore has higher quality.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Evaluation
</SectionTitle>
    <Paragraph position="0"> The knowledge acquired by the method described in the previous section is used in subsequent sentence analysis to prefer those parses where the verb-noun sequence is analyzed in the same way as specified in the knowledge base. When processing a large corpus, what we typically do is analyzing the corpus twice. The first pass is the learning phase where we acquire additional knowledge by parsing the corpus. The knowledge acquired is used in the second pass to get better parses. This is one example of the general approach of &amp;quot;improving parsing by parsing&amp;quot;, as described in (Wu et al 2002).</Paragraph>
    <Paragraph position="1"> To find out how much the learned knowledge contributes to the improvement of parsing, we performed a human evaluation. In the evaluation, we used our existing sentence analyzer (Heidorn 2000, Jensen et al 1993, Wu and Jiang 1998) to process a corpus of 271,690 sentences to learn the verb-noun relations. We then parsed the same sentences first without the additional knowledge and then with the acquired knowledge. Comparing the outputs, we found that 16,445 (6%) of the sentences had different analyses in the two passes. We then randomly selected 500 sentences from those &amp;quot;diff&amp;quot; sentences and presented them to a linguist from an independent agency who, given two different parses of the same sentence, was asked to pick the parse she judged to be more accurate. The order in which the parses were presented was randomized so that the evaluator had no idea as to which tree was from the first pass and which one from the second pass.</Paragraph>
    <Paragraph position="2"> The linguist's judgment showed that, with the additional knowledge that we acquired, 350 (70%) of those sentences parsed better with the additional knowledge, 85 (17%) parsed worse, and 65 (13%) had parses that were equally good or bad. In other words, the accuracy of sentence analysis improved significantly with the learning procedure discussed in this paper.</Paragraph>
    <Paragraph position="3"> Here is an example where the parse became better when the automatically acquired knowledge is used. Due to space limitation, only the parses of a fraction of the sentence is given here: (7) Yao Zun Zhao Guo Jia Ce Shi Biao Zhun yao zunzhao guojia ceshi biaozhun want follow nation testing standard &amp;quot;(You) must follow the national testing standards.&amp;quot; Because of the fact thatZun Zhao is ambiguous between a verb (&amp;quot;follow&amp;quot;) and a preposition (&amp;quot;in accordance with&amp;quot;), this sentence fragment got the parse tree in Figure 7 before the learned knowledge was used, whereBiao Zhun was misanalyzed as the ob- null During the learning process, we acquired &amp;quot;Ce Shi Biao Zhun &amp;quot; as a typical pair where the two words are in the modifier-head relationship. Once this pair was added to our knowledge base, we got the correct parse, where Zun Zhao is analyzed as a verb andCe Shi as a modifier of Biao Zhun : Figure 8: New tree of (7) We later inspected the sentences where the parses became worse and found two sources for the regressions. The main source was of course errors in the learned results, since they had not been manually checked. The second source was an engineering problem: the use of the acquired knowledge required the use of additional memory and consequently exceeded some system limitations when the sentences were very long.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Future work
</SectionTitle>
    <Paragraph position="0"> The approach described in this paper can be applied to the learning of many other typical syntactic relations between words. We have already used it to learn noun-noun pairs where the first noun is a typical modifier of the second noun. This has helped us to rule out incorrect parses where the two nouns were not put into the same constituent.</Paragraph>
    <Paragraph position="1"> Other relations we have been trying to learn include: null * Noun-noun pairs where the two nouns are in conjunction (e.g. Xin Lang Xin Niang &amp;quot;bride and bridegroom&amp;quot;); null * Verb-verb pairs where the two verbs are in conjunction (e.g. Diao Cha Yan Jiu &amp;quot;investigate and study&amp;quot;); * Adjective-adjective pairs where two adjectives are in conjunction (e.g. Nian Qing Piao Liang &amp;quot;young and beautiful&amp;quot;); * Noun-verb pairs where the noun is a typical subject of the verb.</Paragraph>
    <Paragraph position="2"> Knowledge of this kind, once acquired, will benefit not only parsing, but other NLP applications as well, such as machine translation and information retrieval.</Paragraph>
    <Paragraph position="3"> In terms of parsing, the benefit we get there is similar to what we get in lexicalized statistical parsing where parsing decisions can be based on specific lexical items. However, the training of a statistical parser requires a tree bank which is expensive to create while our approach does not. Our approach does require an existing parser, but this parser does not have to be perfect and can be improved as the learning goes on. Once the parser is reasonably good, what we need is just raw text, which is available in large quantities.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML