File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3813_metho.xml

Size: 13,283 bytes

Last Modified: 2025-10-06 14:11:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3813">
  <Title>Matching Syntactic-Semantic Graphs for Semantic Relation Assignment</Title>
  <Section position="5" start_page="81" end_page="83" type="metho">
    <SectionTitle>
3 Input data and semantic relations
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="81" end_page="82" type="sub_section">
      <SectionTitle>
3.1 Input data
</SectionTitle>
      <Paragraph position="0"> We work with a semi-technical text on meteorological phenomena (Larrick, 1961), meant for primary school students. The text gradually introduces concepts related to precipitation, and explains them. Its nature makes it appropriate for the semantic analysis task in an incremental approach. The system will mimic the way in which a human reader accumulates knowledge and uses what was written before to process ideas introduced later in the text.</Paragraph>
      <Paragraph position="1"> The text contains 513 sentences, with an average length of 9.13 words. There are 4686 word tokens and 969 types. The difference between the number of types (2850) and tokens (573) in the extracted pairs (which contain only open-class words) shows that the same concepts recur, as expected in a didactic text.</Paragraph>
      <Paragraph position="2"> The syntactic structures of the input data are produced by a parser with good coverage and detailed syntactic information, DIPETT (Delisle and Szpakowicz, 1995). The parser, written in Prolog, implements a classic constituency English grammar from Quirk et al. (1985). Pairs of syntactic units connected by grammatical relations are extracted from the parse trees. A dependency parser would  produce a similar output, but DIPETT also provides verb subcategorization information (such as, for example, subject-verb-object or subject-verb-objectindirect object), which we use to select the (best) matching syntactic structures.</Paragraph>
      <Paragraph position="3"> To nd pairs, we use simple structural information. If a unit is directly embedded in another unit, we assume a subordinate relation between them; if the two units are coordinate, we assume a coordinate relation. These assumptions are safe if the parse is correct. A modi er is subordinate to its head noun, an argument to its head verb, and a clause perhaps to the main clause in the sentence.</Paragraph>
      <Paragraph position="4"> If we conclude that two units should interact, we seek an appropriate semantic relation to describe this interaction. The system uses three heuristics to nd one or more semantic relation candidates for the cur- null rent pair.</Paragraph>
      <Paragraph position="5"> 1. Word match the system will propose the semantic relation(s) that have previously been assigned to a pair containing the same lemmas.</Paragraph>
      <Paragraph position="6"> 2. Syntactic graph match we elaborate this heuristic in Section 4.</Paragraph>
      <Paragraph position="7"> 3. Marker the system uses a manually built dic null tionary of markers (prepositions, coordinators, subordinators) associated with the semantic relations they indicate. The dictionary contains 325 markers, and a total of 662 marker-relation associations.</Paragraph>
      <Paragraph position="8"> If neither of the three heuristics yield results, the system will present an empty list, and expect the user to input the appropriate relation. When at least one relation is proposed, the user can accept a unique relation, choose among several options, or supply a new one. The system records which action took place, as well as the heuristic that generated the options presented to the user. The pair is also analysed to determine the syntactic level from which it came, to allow for a more detailed analysis of the behaviour of the system.</Paragraph>
    </Section>
    <Section position="2" start_page="82" end_page="83" type="sub_section">
      <SectionTitle>
3.2 Semantic relations
</SectionTitle>
      <Paragraph position="0"> The list of semantic relations with which we work is based on extensive literature study (Barker et al., 1997a). Three lists of relations for three syntactic levels inter-clause, intra-clause (case) and noun-modi er relations were next combined based on syntactic and semantic phenomena. The resulting list is the one used in the experiments we present in this paper. The relations are grouped by general similarity into 6 relation classes (H denotes the head of a base NP, M denotes the modi er).</Paragraph>
      <Paragraph position="1">  1. CAUSAL groups relations enabling or opposing an occurrence. Examples: cause - H causes M: u virus; effect - H is the effect (was caused by) M: exam anxiety; purpose - H is for M: concert hall; 2. CONJUNCTIVE includes relations that describe the conjunction or disjunction of occurrences (events/act/actions/states/activities), entities or attributes: conjunction - both H and M occur or exist (and nothing more can be said about that from the point of view of causality or temporality): running and swimming (are good for you); disjunction - either one or both H and M occur or exist: painting or drawing; 3. PARTICIPANT groups relations between an occurrence and its participants or circumstances.  Examples: agent - M performs H: student protest; object - M is acted upon by H: metal separator; null bene ciary - M bene ts from H: student discount; null 4. SPATIAL groups relations that place an occurrence at an absolute or relative point in space. Examples: direction - H is directed towards M: outgoing mail; location - H is the location of M: home town; location at - H is located at M: desert storm; 5. TEMPORAL groups relations that place an occurrence at an absolute or relative point in time. Examples: frequency - H occurs every time M occurs: weekly game; time at - H occurs when M occurs: morning coffee; time through - H existed while M existed: 2hour trip;  6. QUALITY groups the remaining relations between a verb or noun and its arguments. Examples: null manner - H occurs as indicated by M: stylish writing;  material - H is made of M: brick house; measure - M is a measure of H: heavy rock; There is no consensus in the literature on a list of semantic relations that would work in all situations. This is, no doubt, because a general list of relations such as the one we use would not be appropriate for the semantic analysis of texts in a speci c domain, such as for example medical texts. All the relations in the list we use were necessary, and suf cient, for the analysis of the input text.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="83" end_page="85" type="metho">
    <SectionTitle>
4 Syntactic-semantic graph-matching
</SectionTitle>
    <Paragraph position="0"> Our system begins operation with a minimum of manually encoded knowledge, and accumulates information as it processes the text. This design idea was adopted from TANKA (Barker et al., 1997b).</Paragraph>
    <Paragraph position="1"> The only manually encoded knowledge is a dictionary of markers (subordinators, coordinators, prepositions). This resource does not affect the syntactic-semantic graph-matching heuristic.</Paragraph>
    <Paragraph position="2"> Because the system gradually accumulates knowledge as it goes through the input text, it uses a form of memory-based learning to make predictions about the semantic relation that ts the current pair.</Paragraph>
    <Paragraph position="3"> The type of knowledge that it accumulates consists of previously analysed pairs, together with the semantic relation assigned, and a syntactic-semantic graph centered on each word in a sentence which appears as the main element in a processed pair.</Paragraph>
    <Paragraph position="4"> To process a pair P not encountered previously, the system builds a graph centered on the main element (often the head) of P. This idea was inspired by Delisle et al. (1993), who used a list of arguments surrounding the main verb together with the verb's subcategorization information and previously processed examples to analyse semantic roles (case relations). In recent approaches, syntactic information is translated into features which, together with information from FrameNet, WordNet or VerbNet, will be used with ML tools to make predictions for each example in the test set (Carreras and Marquez, 2004; Carreras and Marquez, 2005).</Paragraph>
    <Paragraph position="5"> Our system builds a (simple) graph surrounding a head word (which may be a verb representing the predicate of a sentence, or representing a clause or noun), and matches it with previously analysed examples.</Paragraph>
    <Paragraph position="6"> A graph G(w) centered on word w consists of the following: a node for w; a set of nodes for each of the words wi in the sentence with which w is connected by a grammatical relation (including situations when w is a modi er/argument); edges that connect w with each wi, tagged with grammatical relation GR (such as subject, object, complement) and connective information Con (prepositions, coordinators, subordinators, or nil). The nodes also contain part-of-speech information for the corresponding word, and other information from the parser (such as subcategorization structure for the verb, if it is available).</Paragraph>
    <Paragraph position="7"> Graph matching starts with the central node, and continues with edge matching. If G(w) is the graph centered on word w whose pairs are currently being processed, the system selects from the collection of previously stored graphs, a set of graphs fG(wi)g, which satisfy the following conditions: The central nodes match. The matching is guided by a set of contraints. We choose the graphs centered on the nodes that satisfy the most constraints, presented here in the order of their importance: w and wi must have the same part of speech.</Paragraph>
    <Paragraph position="8"> w and wi have the same syntactic properties. If w and wi are verbs, they must have the same subcategorization structure.</Paragraph>
    <Paragraph position="9"> w and wi are the same lemma. We emphasize that a graph centered on a different lemma, but with the same subcategorization structure is preferred to a graph with the same lemma, but a different subcategorization structure.</Paragraph>
    <Paragraph position="10"> The edge representing the word pair to which we want to assign a semantic relation has a match in G(wi). From all graphs that comply with this constraint, the ones that have the lowest distance corresponding to the highest matching score are chosen. The graphs are matched edge by edge. Two edges match if the grammatical relation and the connectives match. Figure 1 shows the formula that computes the distance between two graphs. We note that edge matching uses only edge information grammatical and connective information. Using the node information as is (lemmas and their part of speech) is too restrictive. We are looking into using word similarity as a solution of node matching.</Paragraph>
    <Paragraph position="11"> If no matching graph has been found, the system searches for a simpler match, in which the current word pair is matched against previously processed pairs, using the same formula as for edge distance, and preferring the pairs that have the same modi er. This algorithm will retrieve the set of graphs fG(wi)g, which give the same score when matched  De nition of a graph centered on w:</Paragraph>
    <Paragraph position="13"> Distance metric between two graphs:</Paragraph>
    <Paragraph position="15"> with the current graph. The set of possible semantic relations presented to the user consists of the semantic relation on the edge of each G(wi) that matches the edge (of the current graph) corresponding to the word pair which we are analysing.</Paragraph>
    <Paragraph position="16"> To the sentence: When you breathe out on a cold day, you make a cloud.</Paragraph>
    <Paragraph position="17"> corresponds the following syntactic graph:  When we focus on the graph centered on a speci c word, such as breathe, we look only at the node corresponding to the word breathe, and the nodes adjacent to it.</Paragraph>
    <Paragraph position="18"> To process a pair P = (wH,wM), the system rst builds G(wH), and then searches through previously stored graphs for those which have the same center wH, or have the same part of speech as wH assigned to its central node. For each graph found, we compute a distance that gives a measure of the match between the two graphs. The best match will have the smallest distance.</Paragraph>
    <Paragraph position="19"> For example, for the sentence: Weathermen watch the clouds day and night.</Paragraph>
    <Paragraph position="20"> the system builds the following network centered on the predicate watch2:</Paragraph>
    <Paragraph position="22"> The system locates among previously stored networks those centered around verbs3. For the sentence above, the system uses the following graph,  built from the immediately preceding sentence in the text: Air pilots know that clouds can bring rain, hail, sleet and snow.</Paragraph>
    <Paragraph position="23">  According to the metric, the networks match and the pairs (watch, weatherman) and (know, air pilots) match, so the semantic relation for the pair (know, air pilots) is proposed as a possible relation for pair (watch, weatherman) .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML