File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/n06-2015_concl.xml
Size: 2,560 bytes
Last Modified: 2025-10-06 13:55:12
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2015"> <Title>OntoNotes: The 90% Solution</Title> <Section position="8" start_page="58" end_page="59" type="concl"> <SectionTitle> 7 Related and Future Work </SectionTitle> <Paragraph position="0"> PropBank I (Palmer et al., 2005), developed at UPenn, captures predicate argument structure for verbs; NomBank provides predicate argument structure for nominalizations and other noun predicates (Meyers et al., 2004). PropBank II annota- null tion (eventuality ID's, coarse-grained sense tags, nominal coreference and selected discourse connectives) is being applied to a small (100K) parallel Chinese/English corpus (Babko-Malaya et al., 2004). The OntoNotes representation extends these annotations, and allows eventual inclusion of additional shallow semantic representations for other phenomena, including temporal and spatial relations, numerical expressions, deixis, etc. One of the principal aims of OntoNotes is to enable automated semantic analysis. The best current algorithm for semantic role labeling for PropBank style annotation (Pradhan et al., 2005) achieves an F-measure of 81.0 using an SVM. OntoNotes will provide a large amount of new training data for similar efforts.</Paragraph> <Paragraph position="1"> Existing work in the same realm falls into two classes: the development of resources for specific phenomena or the annotation of corpora. An example of the former is Berkeley's FrameNet project (Baker et al., 1998), which produces rich semantic frames, annotating a set of examples for each predicator (including verbs, nouns and adjectives), and describing the network of relations among the semantic frames. An example of the latter type is the Salsa project (Burchardt et al., 2004), which produced a German lexicon based on the FrameNet semantic frames and annotated a large German newswire corpus. A second example, the Prague Dependency Treebank (Hajic et al., 2001), has annotated a large Czech corpus with several levels of (tectogrammatical) representation, including parts of speech, syntax, and topic/focus information structure. Finally, the IL-Annotation project (Reeder et al., 2004) focused on the representations required to support a series of increasingly semantic phenomena across seven languages (Arabic, Hindi, English, Spanish, Korean, Japanese and French). In intent and in many details, OntoNotes is compatible with all these efforts, which may one day all participate in a larger multi-lingual corpus integration effort.</Paragraph> </Section> class="xml-element"></Paper>