XML Viewer - h91-1067

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/h91-1067_intro.xml
Size: 2,986 bytes
Last Modified: 2025-10-06 14:05:01
<?xml version="1.0" standalone="yes"?>
<Paper uid="H91-1067">
  <Title>Automatic Acquisition of Subcategorization Frames from Tagged Text</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Accurate parsing requires knowing the subcategorization frames of verbs, as shown by (1).</Paragraph>
    <Paragraph position="1"> (1) a. I expected \[NP the man who smoked NP\] to eat ice-cream b. I doubted \[NP the man who liked to eat ice-cream NP\] Current high-coverage parsers tend to use either custom, hand-generated lists of subcategorization frames (e.g., \[7\]), or published, hand-generated lists like the Ox\[ord Advanced</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Learner's Dictionary of Contemporary English, \[9\] (e.g., \[5\]).
</SectionTitle>
      <Paragraph position="0"> In either case, such lists are expensive to build and to maintain in the face of evolving usage. In addition, they tend not to include rare usages or specialized vocabularies like financial or military jargon. Further, they are often incomplete in arbitrary ways. For example, Webster's Ninth New Collegiate Dictionary lists the sense of strike meaning &amp;quot;to occur to&amp;quot;, as in &amp;quot;it struck him that...&amp;quot;, but it does not list that same sense of hit. (Our program discovered both.) To address these problems we have implemented a program that takes a tagged text corpus and generates a partial list of the subcategorization frames in which each verb occurs.</Paragraph>
      <Paragraph position="1"> The program uses only a small, finite-state grammar for a fragment of English. The completeness of the output list increases monotonically with the total number of occurrences of each verb in the training corpus.</Paragraph>
      <Paragraph position="2"> Automatically learning subcategorization frames (SFs) is impeded by a bootstrapping problem -- you can't parse without knowing SFs and you can't learn from examples without parsing them. For instance, the obvious approach to identifying verbs that take infinitival complements would be to look for a verb followed by an infinitive. Unfortunately, as shown by (1), finding such a case does not license any definite conclusions. Our system bootstraps by recognizing those sentences that it can parse without already knowing the SFs -- mainly sentences involving pronouns or proper names rather than full noun-phrases in certain argument positions. It simply ignores other sentences. The distributional constraints on pronouns and full noun-phrases are almost identical, so lessons learned in the easy-to-parse cases apply to all cases.</Paragraph>
      <Paragraph position="3"> The remainder of this paper consists of a section describing and quantifying our results, a section describing the methods used to obtain them, and a section discussing related work.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML