File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0622_intro.xml
Size: 3,219 bytes
Last Modified: 2025-10-06 14:03:14
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0622"> <Title>Semantic Role Labelling with Tree Conditional Random Fields</Title> <Section position="4" start_page="0" end_page="169" type="intro"> <SectionTitle> 2 Data </SectionTitle> <Paragraph position="0"> The data used for this task was taken from the Propbank corpus, which supplements the Penn Treebank with semantic role annotation. Full details of the data set are provided in Carreras and M`arquez (2005).</Paragraph> <Section position="1" start_page="0" end_page="169" type="sub_section"> <SectionTitle> 2.1 Data Representation </SectionTitle> <Paragraph position="0"> From each training instance we derived a tree, using the parse structure from the Collins parser. The nodes in the trees were relabelled with a semantic role label indicating how their corresponding syntactic constituent relates to each predicate, as shown in Figure 1. The role labels are shown as subscripts in the figure, and both the syntactic categories and the words at the leaves are shown for clarity only - these were not included in the tree. Additionally, the dashed lines show those edges which were pruned, following Xue and Palmer (2004) - only nodes which are siblings to a node on the path from the verb to the root are included in the tree. Child nodes of included prepositional phrase nodes are also included. This reduces the size of the resultant tree whilst only very occasionally excluding nodes which should be labelled as an argument.</Paragraph> <Paragraph position="1"> The tree nodes were labelled such that only argument constituents received the argument label while all argument children were labelled as outside, O.</Paragraph> <Paragraph position="2"> Where there were parse errors, such that no constituent exactly covered the token span of an argument, the smaller subsumed constituents were all given the argument label.</Paragraph> <Paragraph position="3"> We experimented with two alternative labelling strategies: labelling a constituent's children with a new 'inside' label, and labelling the children with the parent's argument label. In the figure, the IN and NP children of the PP would be affected by these changes, both receiving either the inside I label or AM-LOC label under the respective strategies. The inside strategy performed nearly identically to the standard (outside) strategy, indicating that either the model cannot reliably predict the inside argument, or that knowing that the children of a given node are inside an argument is not particularly useful in predicting its label. The second (duplication) strategy performed extremely poorly. While this allowed the internal argument nodes to influence their ancestor towards a particular labelling, it also dramatically increased the number of nodes given an argument label. This lead to spurious over-prediction of arguments. null The model is used for decoding by predicting the maximum probability argument label assignment to each of the unlabelled trees. When these predictions were inconsistent, and one argument subsumed another, the node closest to the root of the tree was deemed to take precedence over its descendants.</Paragraph> </Section> </Section> class="xml-element"></Paper>