File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/91/h91-1067_relat.xml
Size: 2,482 bytes
Last Modified: 2025-10-06 14:16:05
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1067"> <Title>Automatic Acquisition of Subcategorization Frames from Tagged Text</Title> <Section position="5" start_page="343" end_page="343" type="relat"> <SectionTitle> RELATED WORK </SectionTitle> <Paragraph position="0"> Interest in extracting lexical and especially collocational information from text has risen dramatically in the last two years, as sufficiently large corpora and sufficiently cheap computation have become available. Three recent papers in this area are \[3\], \[8\], and \[12\]. The latter two are concerned exclusively with collocation relations between open-class words and not with grammatical properties. Church is also interested primarily in open-class collocations, but he does discuss verbs that tend to be followed by infinitives within his mutual information framework.</Paragraph> <Paragraph position="1"> Mutual information, as applied by Church, is a measure of the tendency of two items to appear near one-another -their observed frequency in nearby positions is divided by the expectation of that frequency if their positions were random and independent. As Church points out, having such statistics for word-pairs is useful for the predictive models used in optical character-recognition and speech recognition as well as for syntactic disambiguation. To measures the tendency of verbs to be followed within a few words by 2Note that this is not an arbitrary confidence level, which would be less unsavory, but an actual percentage of verb occurrences. That is, there is a fact of the matter -- a natural clustering, but no systematic characterization of it is available, so an eyeball estimate must be used instead.</Paragraph> <Paragraph position="2"> infinitives, Church uses his statistical disambiguator (\[4\]) to distinguish between to as an infinitive marker and to as a preposition. Then he measures the mutual information between occurrences of the verb and occurrences of infinitives following within a certain number of words. Unlike our system, Church's approach does not aim to decide whether or not a verb occurs with an infinitival complement -- example (1) showed that being followed by an infinitive is not the same as taking an infinitival complement. It might be interesting to try building a verb categorization scheme based on Church's mutual information measure, but to the best of our knowledge no such work has been reported.</Paragraph> </Section> class="xml-element"></Paper>