File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/06/w06-2407_relat.xml
Size: 5,680 bytes
Last Modified: 2025-10-06 14:15:56
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2407"> <Title>Extending corpus-based identification of light verb constructions using a supervised learning framework</Title> <Section position="3" start_page="49" end_page="50" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> With the recent availability of large corpora, statistical methods that leverage syntactic features are a current trend. This is the case for LVC detection as well.</Paragraph> <Paragraph position="1"> Grefenstette and Teufel (1995) considered a similar task of identifying the most probable light verb for a given deverbal noun. Their approach focused on the deverbal noun and occurrences of the noun's verbal form, arguing that the deverbal noun retains much of the verbal characteristics in the LVCs. To distinguish the LVC from other verb-object pairs, the deverbal noun must share similar argument/adjunct structures with its verbal counterpart. Verbs that appear often with these characteristic deverbal noun forms are deemed light verbs. They approximate the identification of argument/adjunct structures by using the preposition head of prepositional phrases that occur after the verb or object of interest.</Paragraph> <Paragraph position="2"> Let n be a deverbal noun whose most likely light verb is to be found. Denote its verbal form by vprime, and let P be the set containing the three most frequently occurring prepositions that occur after vprime. The verb-object pairs that are not followed by a preposition in P are filtered out. For any verb v, let g(v,n) be the count of verb-object pairs v-n that remain after the filtering step above. Grefenstette and Teufel proposed that the light verb for n be returned by the following equation:</Paragraph> <Paragraph position="4"> Interestingly, Grefenstette and Teufel indicated that their subsequent experiments suggested that the filtering step may not be necessary.</Paragraph> <Paragraph position="5"> Whereas the GT95 measure centers on the deverbal object, Dras and Johnson (1996) also consider the verb's corpus frequency. The use of this complementary information improves LVC identification, as it models the inherent bias of some verbs to be used more often as light verbs than others. Let f(v,n) be the count of verb-object pairs occurring in the corpus, such that v is the verb, n is a deverbal noun. Then, the most probable light verb for n is given by:</Paragraph> <Paragraph position="7"> Stevenson et al. (2004)'s research examines evidence from constructions featuring determiners.</Paragraph> <Paragraph position="8"> They focused on expressions of the form v-a-n and v-det-n, where v is a light verb, n is a deverbal noun, a is an indefinite determiner (namely, &quot;a&quot; or &quot;an&quot;), and det is any determiner other than the indefinite. Examples of such constructions are &quot;give a speech&quot; and &quot;take a walk&quot;. They employ mutual information which measures the frequency of co-occurrences of two variables, corrected for random agreement. Let I(x,y) be the mutual information between x and y. Then the following measure can be used:</Paragraph> <Paragraph position="10"> where higher values indicate a higher likelihood of v-a-n being a light verb construction. Also, they suggested that the determiner &quot;the&quot; be excluded from the development data since it frequently occurred in their data.</Paragraph> <Paragraph position="11"> Recently, Fazly et al. (2005) have proposed a statistical measure for the detection of LVCs. The probability that a verb-object pair v-n (where v is a light verb) is a LVC can be expressed as a product of three probabilities: (1) probability of the object n occurring in the corpus, (2) the probability that n is part of any LVC given n, and (3) the probability of v occurring given n and that v-n is a LVC. Each of these three probabilities can then be estimated by the frequency of occurrence in the corpus, using the assumption that all instances of vprime-a-n is a LVC, where vprime is any light verb and a is an indefinite determiner.</Paragraph> <Paragraph position="12"> To summarize, research in LVC detection started by developing single measures that utilized simple frequency counts of verbs and their complements. From this starting point, research has developed in two different directions: using more informed measures for word association (specifically, mutual information) and modeling the context of the verb-complement pair.</Paragraph> <Paragraph position="13"> Both the GT95 and DJ96 measures suffer from using frequency counts directly. Verbs that are not light but occur very frequently (such as &quot;buy&quot; and &quot;sell&quot; in the Wall Street Journal) will be marked by these measures. As such, given a deverbal noun, they sometimes suggest verbs that are not light.</Paragraph> <Paragraph position="14"> We hypothesize that substituting MI for frequency count can alleviate this problem.</Paragraph> <Paragraph position="15"> The SFN04 metric adds in the context provided by determiners to augment LVC detection. This measure may work well for LVCs that are marked by determiners, but excludes a large portion of LVCs that are composed without determiners. To design a robust LVC detector requires integrating such specific contextual evidence with other general evidence.</Paragraph> <Paragraph position="16"> Building on this, Fazly et al. (2005) incorporate an estimation of the probability that a certain noun is part of a LVC. However, like SFN04, LVCs without determiners are excluded.</Paragraph> </Section> class="xml-element"></Paper>