File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-2015_intro.xml
Size: 1,198 bytes
Last Modified: 2025-10-06 14:03:32
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2015"> <Title>OntoNotes: The 90% Solution</Title> <Section position="3" start_page="0" end_page="57" type="intro"> <SectionTitle> 2 Treebanking </SectionTitle> <Paragraph position="0"> The Penn Treebank (Marcus et al., 1993) is annotated with information to make predicate-argument structure easy to decode, including function tags and markers of &quot;empty&quot; categories that represent displaced constituents. To expedite later stages of annotation, we have developed a parsing system (Gabbard et al., 2006) that recovers both of these latter annotations, the first we know of. A first-stage parser matches the Collins (2003) parser on which it is based on the Parseval metric, while simultaneously achieving near state-of-the-art performance on recovering function tags (F-measure 89.0). A second stage, a seven stage pipeline of maximum entropy learners and voted perceptrons, achieves state-of-the-art performance (F-measure 74.7) on the recovery of empty categories by combining a linguistically-informed architecture and a rich feature set with the power of modern machine learning methods.</Paragraph> </Section> class="xml-element"></Paper>