File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/c00-2100_concl.xml

Size: 2,102 bytes

Last Modified: 2025-10-06 13:52:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-2100">
  <Title>Automatic Extraction of Subcategorization Frames for Czech*</Title>
  <Section position="7" start_page="695" end_page="696" type="concl">
    <SectionTitle>
6 Conclusion
</SectionTitle>
    <Paragraph position="0"> We arc currently incorporating the SF information produced by the methods described in this paper into a parser for Czech. We hope to duplicate the increase in performance shown by treebank-based parsers for English when they use SF information.</Paragraph>
    <Paragraph position="1"> Our methods can also be applied to improve the annotations in the original treebank that we use as training data. The automatic addition of subcategorization to the treebank can be exploited to add predicate-argument information to the treebank.</Paragraph>
    <Paragraph position="2"> Also, techniques for extracting SF information fiom data can be used along with other research which aims to discover relationships between different SFs of a verb (Stevenson and Merlo, t999; Lapata and Brew, 1999; Lapata, 1999; Stevenson et al., 1999).</Paragraph>
    <Paragraph position="3"> The statistical models in this paper were based on the assumption that given a verb, different SFs occur independently. This assumption is used to justify the use of the binomial. Future work perhaps should look towards removing this assumption by modeling the dependence between different SFs for the same verb using a multinomial distribution.</Paragraph>
    <Paragraph position="4"> To summarize: we have presented techniques that can be used to learn subcategorization information for verbs. We exploit a dependency treebank to learn this information, and moreover we discover the final set of valid subcategorization frames from the training data. We achieve upto 88% precision on unseen data.</Paragraph>
    <Paragraph position="5"> We have also tried our methods on data which was automatically morphologically tagged which  allowed us to use more data (82K sentences instead of 19K). The performance went up to 89% (a 1% improvement).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML