File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-1201_intro.xml
Size: 7,391 bytes
Last Modified: 2025-10-06 14:03:17
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-1201"> <Title>Classification of semantic relations by humans and machines [?] Erwin Marsi and Emiel Krahmer Communication and Cognition</Title> <Section position="4" start_page="1" end_page="1" type="intro"> <SectionTitle> 2 Corpus and Task definition </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 2.1 Corpus </SectionTitle> <Paragraph position="0"> We have developed a parallel monolingual corpus consisting of two different Dutch translations of the French book &quot;Le petit prince&quot; (the little prince) by Antoine de Saint-Exup'ery (published 1943), one by Laetitia de Beaufort-van Hamel (1966) and one by Ernst Altena (2000). For our purposes, this proved to be a good way to quickly find a large enough set of related sentence pairs, which differ semantically in interesting and subtle ways. In this work, we used the first five chapters, with 290 sentences and 3600 words in the first translation, and 277 sentences and 3358 words in the second translation.</Paragraph> <Paragraph position="1"> The texts were automatically tokenized and split into sentences, after which errors were manually corrected. Corresponding sentences from both translations were manually aligned; in most cases this was a one-to-one mapping, but occasionally a single sentence in one translation mapped onto two or more sentences in the other: this occurred 23 times in all five chapters. Next, the Alpino parser for Dutch (e.g., (Bouma et al., 2001)) was used for part-of-speech tagging and lemmatizing all words, and for assigning a dependency analysis to all sentences.</Paragraph> <Paragraph position="2"> The POS labels indicate the major word class (e.g.</Paragraph> <Paragraph position="3"> verb, noun, adj, and adv). The dependency relations hold between tokens and are identical to those used in the Spoken Dutch Corpus. These include dependencies such as head/subject, head/modifier and coordination/conjunction. If a full parse could not be obtained, Alpino produced partial analyses collected under a single root node. Errors in lemmatization, POS tagging, and syntactic dependency parsing were not subject to manual correction.</Paragraph> </Section> <Section position="2" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 2.2 Task definition </SectionTitle> <Paragraph position="0"> The task to be performed can be described informally as follows: given two dependency analyses, align those nodes that are semantically related. More precisely: For each node v in the dependency structure for a sentence S, we define STR(v) as the sub-string of all tokens under v (i.e., the composition of the tokens of all nodes reachable from v). An alignment between sentences S and Sprime pairs nodes from the dependency graphs for both sentences. Aligning node v from the dependency graph D of sentence S with node vprime from the graph Dprime of Sprime indicates that there is a semantic relation between STR(v) and STR(vprime), that is, between the respective substrings associated with v and vprime. We distinguish five potential, mutually exclusive, relations between nodes (with illustrative examples): 1. v equals vprime iff STR(v) and STR(vprime) are literally identical (abstracting from case). Example: &quot;a small and a large boa-constrictor&quot; equals &quot;a large and a small boa-constrictor&quot;; 2. v restates vprime iff STR(v) is a paraphrase of STR(vprime) (same information content but different wording). Example: &quot;a drawing of a boaconstrictor snake&quot; restates &quot;a drawing of a boa- null constrictor&quot;; 3. v specifies vprime iff STR(v) is more specific than STR(vprime). Example: &quot;the planet B 612&quot; specifies &quot;the planet&quot;; 4. v generalizes vprime iff STR(vprime) is more specific than STR(v). Example: &quot;the planet&quot; generalizes &quot;the planet B 612&quot;; 5. v intersects vprime iff STR(v) and STR(vprime) share some informational content, but also each express some piece of information not expressed in the other. Example: &quot;Jupiter and Mars&quot; intersects &quot;Mars and Venus&quot; Figure 1 shows an example alignment with semantic relations between the dependency structures of veel contacten gehad met heel veel serieuze personen. (lit. 'Thus have I in the course of my life very many contacts had with very many serious persons') and Op die manier kwam ik in het leven met massa's gewichtige mensen in aanraking.. (lit. 'In that way came I in the life with mass-of weighty/important people in touch'). The alignment relations are equals (dotted gray), restates (solid gray), specifies (dotted black), and intersects (dashed gray). For the sake of transparency, dependency relations have been omitted. two sentences. Note that there is an intuitive relation with entailment here: both equals and restates can be understood as mutual entailment (i.e., if the root nodes of the analyses corresponding S and Sprime stand in an equal or restate relation, S entails Sprime and Sprime entails S), if S specifies Sprime then S also entails Sprime and if S generalizes Sprime then S is entailed by Sprime.</Paragraph> <Paragraph position="1"> In remainder of this paper, we will distinguish two aspects of this task: alignment is the subtask of pairing related nodes - or more precise, pairing the token strings corresponding to these nodes; classification of semantic relations is the subtask of labeling these alignments in terms of the five types of semantic relations.</Paragraph> </Section> <Section position="3" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 2.3 Annotation procedure </SectionTitle> <Paragraph position="0"> For creating manual alignments, we developed a special-purpose annotation tool which shows, side by side, two sentences, as well as their respective dependency graphs. When the user clicks on a node v in the graph, the corresponding string (STR(v)) is shown at the bottom. The tool enables the user to manually construct an alignment graph on the basis of the respective dependency graphs. This is done by focusing on a node in the structure for one sentence, and then selecting a corresponding node (if possible) in the other structure, after which the user can select the relevant alignment relation. The tool offers additional support for folding parts of the graphs, highlighting unaligned nodes and hiding dependency relation labels.</Paragraph> <Paragraph position="1"> All text material was aligned by the two authors.</Paragraph> <Paragraph position="2"> They started with annotating the first ten sentences of chapter one together in order to get a feel for the task. They continued with the remaining sentences from chapter one individually (35 sentences and 521 in the first translation, and 35 sentences and 481 words in the second translation). Next, both annotators discussed annotation differences, which triggered some revisions in their respective annotation. They also agreed on a single consensus annotation. Interannotator agreement will be discussed in the next two sections. Finally, each author annotated two additional chapters, bringing the total to five.</Paragraph> </Section> </Section> class="xml-element"></Paper>