File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/h01-1072_evalu.xml

Size: 4,334 bytes

Last Modified: 2025-10-06 13:58:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1072">
  <Title>T&amp;quot;uSBL: A Similarity-Based Chunk Parser for Robust Syntactic Processing</Title>
  <Section position="10" start_page="4" end_page="5" type="evalu">
    <SectionTitle>
5. QUANTITATIVE EVALUATION
</SectionTitle>
    <Paragraph position="0"> A quantitative evaluation of T&amp;quot;uSBL has been conducted using a semi-automatically constructed treebank of German that consists of appr. 67,000 fully annotated sentences or sentence fragments.</Paragraph>
    <Paragraph position="1">  The evaluation consisted of a ten-fold cross-validation test, where the training data provide an instance base of already seen cases for T&amp;quot;uSBL's tree construction module.</Paragraph>
    <Paragraph position="2"> The evaluation focused on three PARSEVAL measures: labeled precision, labeled recall and crossing accuracy, with the results shown in Table 1.</Paragraph>
    <Paragraph position="3"> While these results do not reach the performance reported for other parsers (cf. [7], [8]), it is important to note that the task carried out here is more difficult in a number of respects: 1. The set of labels does not only include phrasal categories, but also functional labels marking grammatical relations such as subject, direct object, indirect object and modifier. Thus, the evaluation carried out here is not subject to the justified criticism levelled against the gold standards that are typically  in conjunction with the PARSEVAL measures, namely that the gold standards used typically do not include annotations of syntactic-semantic dependencies between bracketed constituents. null 2. The German treebank consists of transliterated spontaneous speech data. The fragmentary and partially ill-formed nature of such spoken data makes them harder to analyze than written data such as the Penn treebank typically used as gold standard.</Paragraph>
    <Paragraph position="4"> It should also be kept in mind that the basic PARSEVAL measures were developed for parsers that have as their main goal a complete analysis that spans the entire input. This runs counter to the basic philosophy underlying an amended chunk parser such as T&amp;quot;uSBL, which has as its main goal robustness of partially analyzed structures: Precision and recall measure the percentage of brackets, i.e. constituents with the same yield or bracketing scope, which are identical in the parse tree and the gold standard. If T&amp;quot;uSBL finds only a partial grouping on one level, both measures consider this grouping wrong, as a consequence of the different bracket scopes. In most cases, the error 'percolates' up to the highest level. Fig. 10 gives an example of a partially matched tree structure for the sentence &amp;quot;bei mir ginge es im Februar ab Mittwoch den vierten&amp;quot; (for me it would work in February after Wednesday the fourth).</Paragraph>
    <Paragraph position="5"> The only missing branch is the branch connecting the second noun phrase (NX) above &amp;quot;Mittwoch&amp;quot; to the NX &amp;quot;den vierten&amp;quot;. This results in precision and recall values of 10 out of 15 because of the altered bracketing scopes of the noun phrase, the two prepositional phrases (PX), the field level (MF) and the sentence level (SIMPX). In order to capture this specific aspect of the parser, a second evaluation was performed that focused on the quality of the structures produced by the parser. This evaluation consisted of manually judging the T&amp;quot;uSBL output and scoring the accuracy of the recognized constituents. The scoring was performed by the human annotator who constructed the treebank and was thus in a privileged position to judge constituent accuracy with respect to the treebank annotation standards. This manual evaluation resulted in a score of 92.4% constituent accuracy; that is: of all constituents that were recognized by the parser, 92.4% were judged correct by the human annotator. This seems to indicate that approximately 20% of the precision errors are due to partial constituents whose yield is shorter than in the corresponding gold standard. Such discrepancies typically arise when T&amp;quot;uSBL outputs only partial trees. This occurs when no complete tree structures can be constructed that span the entire input.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML