File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-0817_intro.xml

Size: 2,596 bytes

Last Modified: 2025-10-06 14:02:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0817">
  <Title>Semantic Role Labelling with Similarity-Based Generalization Using EM-based Clustering</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Data and Instances
</SectionTitle>
    <Paragraph position="0"> Parsing. To tag and parse the data, we used LoPar (Schmid, 2000), a probabilistic context-free parser, which comes with a Head-Lexicalised Grammar for English (Carroll and Rooth, 1998).</Paragraph>
    <Paragraph position="1"> We considered only the most probable parse for each sentence and simplified parse trees by eliminating unary nodes. The resulting nodes form the instances of our classification. We used the Stuttgart TreeTagger (Schmid, 1994) to lemmatise constituent heads.</Paragraph>
    <Paragraph position="2"> Projection of role labels. FrameNet provides semantic roles as character offsets. We labelled those instances (i.e. nodes in the parse tree) with gold standard semantic roles which corresponded to roles' maximal projections. 13.95% of roles in the training corpus spanned more than one parse tree node. Figure 1 shows an example sentence for the AWARENESS frame. The nodes' respective semantic role labels are given in small caps, and the target predicate is marked in boldface.</Paragraph>
    <Paragraph position="3">  Semantic clustering. We used clustering to generalise over possible fillers of roles. In a first model, we derived a probability distribution a0a2a1a4a3a6a5 for pairs a3a8a7a9a1a10a3a2a11a13a12a14a3a16a15a17a5 , where a3a18a11 is a target:role combination and a3 a15 is the head lemma of a role filler. The key idea is that a3 a11 and a3 a15 are mutually independent, but conditioned on an unobserved class a19a21a20a23a22 . In this manner, we define the probability of a3a24a7a25a1a10a3 a11 a12a14a3 a15 a5 a20</Paragraph>
    <Paragraph position="5"> Estimation was performed using a variant of the expectation-maximisation algorithm (Prescher et al., 2000). We used this model both as a feature and in the generalisation described in Sec. 5. In a second model, we clustered pairs of target:role and the</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Association for Computational Linguistics
</SectionTitle>
      <Paragraph position="0"> for the Semantic Analysis of Text, Barcelona, Spain, July 2004 SENSEVAL-3: Third International Workshop on the Evaluation of Systems syntactic properties of the role fillers; the resulting model was only used for generalisation.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML