File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3815_metho.xml

Size: 27,472 bytes

Last Modified: 2025-10-06 14:10:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3815">
  <Title>Context Comparison as a Minimum Cost Flow Problem</Title>
  <Section position="3" start_page="97" end_page="98" type="metho">
    <SectionTitle>
2 The Network Flow Method
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="97" end_page="97" type="sub_section">
      <SectionTitle>
2.1 Minimum Cost Flow
</SectionTitle>
      <Paragraph position="0"> As a standard example of an MCF problem, consider the graphical representation of a route map for delivering fresh produce from grocers (supply nodes) to homes (demand nodes). The remaining nodes (e.g., intersections, gas stations) have neither a supply nor a demand. Assuming there are sufficient supplies, the optimal solution is to find the cheapest set of routes from grocers to homes such that all demands are satisfied.</Paragraph>
      <Paragraph position="1"> Mathematically, let a2a4a3a6a5a8a7a10a9a12a11a14a13 be a connected network, where a15 is the set of nodes, and a16 is the set of edges.1 Each edge has a cost a17a19a18a20a16a22a21 a23 , which is the distance of the edge. Each node a24a20a25a26a15 is associated with a value a27a28a5a29a24a30a13 such that a27a31a18a32a15a33a21a34a23 indicates its available supply (a27a28a5a29a24a30a13a36a35a38a37 ), its demand (a27a28a5a29a24a30a13a20a39a40a37 ), or neither (a27a28a5a29a24a30a13a41a3a4a37 ). The goal is to find a solution for each node a24 such that all the flow passing through a24 satisfies its supply or demand requirement (a27a28a5a29a24a30a13 ). The flow passing through node a24 is captured by a42a4a18a43a16a44a21a45a23 such that we can observe the com- null bined incoming flow, a47a49a48a51a50a28a52a53a55a54a57a56a59a58a61a60a63a62a28a42a64a5a66a65a67a9a68a24a30a13 , from the entering edges a69a68a7a31a70 , as well as the combined outgoing flow, a47 a48a71a53a29a52a72a12a54a66a56a74a73a43a75a74a76a77a62 a42a64a5a29a24a78a9a80a79a81a13 , via the exiting edges a82a84a83a86a85 a70 . (See Figure 1.) If a feasible solution can be found, the net flow (the difference between the entering and exiting flow) at each node must fulfill the corresponding supply or demand requirement.</Paragraph>
      <Paragraph position="2"> Formally, the MCF problem can be stated as:</Paragraph>
      <Paragraph position="4"> The constraint specified by (2) ensures that the difference between the flow entering and exiting each node a24 matches its supply or demand a27a28a5a29a24a30a13 exactly. The next constraint (3) ensures that the flow is transported from the supply to the demand but not in the opposite direction. Finally, selecting route a5a29a24a137a9a80a79a81a13 requires a transportation &amp;quot;effort&amp;quot; of a17a59a5a29a24a78a9a80a79a138a13 (cost of the route) multiplied by the amount of supply transported a42a64a5a29a24a78a9a80a79a81a13 (the term inside the summation in eqn. (1)). Taking the summation of the effort, a17a59a5a29a24a78a9a80a79a81a13a140a139a121a42a106a5a29a24a137a9a80a79a81a13 , of cheapest routes yields the desired distance between the supply and the demand.</Paragraph>
    </Section>
    <Section position="2" start_page="97" end_page="98" type="sub_section">
      <SectionTitle>
2.2 Semantic Distance as MCF
</SectionTitle>
      <Paragraph position="0"> To cast our context comparison task into this framework, we first represent each context as a vector of concept frequencies (or a context profile for the remainder of this paper). The profile of one context is chosen as the supply and the other as the demand.</Paragraph>
      <Paragraph position="1"> The concept frequencies of the profiles are normalized, so that the total supply always equals the total  demand. The cost of the routes between nodes is determined by a semantic distance measure defined over any two nodes in the ontology. Now, as in the grocery delivery domain, the goal is to find the MCF from supply to demand.</Paragraph>
      <Paragraph position="2"> We can treat any ontology as the transport network. A relation (such as hyponymy) between two concepts a24 and a79 is represented by an edge a5a29a24a78a9a80a79a81a13 , and the cost a17 on each edge can be defined as the semantic distance between the two concepts. This semantic distance can be as simple as the number of edges separating the concepts, or more sophisticated, such as Lin's (1998) information-theoretic measure. (See Budanitsky and Hirst (2006) for a survey of such measures).</Paragraph>
      <Paragraph position="3"> Numerous methods are possible for converting the word frequency vector of a context to a concept frequency vector (i.e., a context profile). One simple method is to transfer each element in the word vector (i.e., the frequency of each word) to the corresponding concepts in the ontology, resulting in a vector of concept frequencies. In this paper, we have chosen a uniform distribution of word frequency counts among concepts, instead of a weighted distribution towards the relevant concepts for a particular text.</Paragraph>
      <Paragraph position="4"> Since we wish to evaluate the strength of our method alone without any additional NLP effort, we bypass the issue of approximating the true distribution of the concepts via word sense disambiguation or class-based approximation methods, such as those by Li and Abe (1998) and Clark and Weir (2002).</Paragraph>
      <Paragraph position="5"> To calculate the distance between two profiles, we need to cast one profile as the supply (a141 ) and the other as the demand (a142 ). Note that our distance is symmetric, so the choice of the supply and the demand is arbitrary. Next, we must determine the value of a27a28a5a29a24a30a13 at each concept node a24 ; this is just the difference between the (normalized) supply frequency a143a78a144a145a5a30a146a147a13 and demand frequency a143a137a148a149a5a30a146a68a13 :</Paragraph>
      <Paragraph position="7"> This formula yields the net supply/demand, a27a28a5a29a24a30a13 , at node a24 . Recall that our goal is to transport all the supply to meet the demand--the final step is to determine the cheapest routes between a141 and a142 such that the constraints in (2) and (3) are satisfied. The total distance of the routes, or the MCF, a155a74a5a113a156a42a132a13 in eqn. (1), is the distance between the two context profiles.</Paragraph>
      <Paragraph position="8"> Finally, it is important to note that the MCF formulation does not simply find the shortest paths from the concept nodes in the supply to those in the demand. Because a profile is a frequency-weighted concept vector, some concept nodes are weighted more heavily than others, and the routes between such nodes across the two profiles are also weighted more heavily. Indeed, in eqn. (1), the cost of each route, a17a28a5a29a24a137a9a80a79a81a13 , is weighted by a42a64a5a29a24a78a9a80a79a81a13 (how much supply, or frequency weight, is transported between nodes a24 and a79 ).</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="98" end_page="99" type="metho">
    <SectionTitle>
3 Graphical Issues
</SectionTitle>
    <Paragraph position="0"> As alluded to in the introduction, certain concept-to-concept distances pose a problem to solving the MCF problem easily. The details are described next.</Paragraph>
    <Section position="1" start_page="98" end_page="98" type="sub_section">
      <SectionTitle>
3.1 Additivity
</SectionTitle>
      <Paragraph position="0"> In theory, our method has the flexibility to incorporate different concept-to-concept distances. The issue lies in the algorithms for solving MCF problems.</Paragraph>
      <Paragraph position="1"> Existing algorithms are greedy--they take a step-wise &amp;quot;localist&amp;quot; approach on the set of edges connecting the supply and the demand; i.e., at each node, the cheapest outgoing edge is selected. The assumption is that the concept-to-concept distance function is additive. Mathematically, for any path from node a24 to node a157 , a158a81a5a98a79a113a159a59a9a80a79a28a160a119a13a12a9a121a161a121a161a121a161a162a9a113a5a98a79a121a163a81a164a67a160a113a9a80a79a121a163a124a13a137a165 , where a24a106a3a166a79a113a159 and a157a167a3a168a79 a163 , the distance between nodes a24 and a157 is the sum of the distance of the edges along the path:</Paragraph>
      <Paragraph position="3"> The additivity of a concept-to-concept distance entails that selecting the cheapest edge at each step (i.e., locally) yields the overall cheapest set of routes (i.e., globally). Note that some of the most successful concept-to-concept distances proposed in the CL literature are non-additive (e.g., Lin, 1998; Resnik, 1995). This poses a problem in solving our network flow problem--the global distance between any concepts, a24 and a157 , cannot be correctly determined by the greedy method.</Paragraph>
    </Section>
    <Section position="2" start_page="98" end_page="99" type="sub_section">
      <SectionTitle>
3.2 Constructing an Equivalent Bipartite
</SectionTitle>
      <Paragraph position="0"> Network The issue of non-additive distances can be addressed in the following way. We map the relevant portion  to the network produced by our transformation (c), given two profiles S and D. Nodes labelled with either &amp;quot;S&amp;quot; or &amp;quot;D&amp;quot; belong to the corresponding profile. Nodes labelled with &amp;quot;a185a187a186 &amp;quot; or &amp;quot;a185a121a188 &amp;quot; are junction nodes (see section 4.2). of the network into a new network such that the concept-to-concept distance is preserved, but without the problem introduced by non-additivity. One possible solution is to construct a complete bipartite graph between the supply nodes and the demand nodes (the nodes in the two context profiles). We set the cost of each edge a5a66a189a190a9a78a191a138a13 in the bipartite graph to be the concept-to-concept distance between a189 and a191 in the original network. Since there is exactly one edge between any pair of nodes, the non-additivity is removed entirely. (See Figures 2(a) and 2(b).) Now, we can apply a network flow solver on the new graph.</Paragraph>
      <Paragraph position="1"> However, one problem arises from performing the above mapping--there is a processing bottleneck as a result of the quadratic increase in the number of edges in the new network. Unfortunately, though tractable, polynomial complexity is not always practical. For example, with an average of 900 nodes per profile, making 120 profile comparisons in addition to network re-structuring can take as long as 10 days.2 If we choose to use a non-additive distance, the method described above does not scale up well for a large number of comparisons. Next, we present a method to alleviate the complexity issue.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="99" end_page="101" type="metho">
    <SectionTitle>
4 Network Transformation
</SectionTitle>
    <Paragraph position="0"> One method of alleviating the bottleneck is to reduce the processing load from generating a large number 2This is tested on a context comparison task not reported in this paper. The code is scripted in perl. The experiment was performed on a machine with two P4 Xeon CPUs running at 3.6GHz, with a 1MB cache and 6GB of memory.</Paragraph>
    <Paragraph position="1"> of edges. Instead of generating a complete bipartite network, we generate a network which approximates both the structure of the original network as well as that of the complete bipartite network. The goal is to construct a pared-down network such that (a) a reduction in the number of edges improves efficiency, and (b) the resulting distance distortion does not hamper performance significantly.</Paragraph>
    <Section position="1" start_page="99" end_page="100" type="sub_section">
      <SectionTitle>
4.1 Path Shape in a Hierarchy
</SectionTitle>
      <Paragraph position="0"> To understand our transformation method, let us further examine the graphical properties of an ontology as a network. In a hierarchical network (e.g., Word-Net, Gene Ontology, UMLS), calculating the distance between two concept nodes usually involves travelling &amp;quot;up&amp;quot; and &amp;quot;down&amp;quot; the hierarchy. The simplest route is a single hop from a child to its parent or vice versa. Generally, travelling from one node a24 to another node a79 consists of an A-shaped path ascending from node a24 to a common ancestor of a24 and a79 , and then descending to node a79 .</Paragraph>
      <Paragraph position="1"> Interestingly, our description of the A-shaped path matches the design of a number of concept-to-concept distances. For example, distances that incorporate Resnik's (1995) information content (IC), a192a126a193a98a194a59a195 a5a102a196a197a5a68a198a137a199a59a200a201a198a12a202a30a196a63a203a78a13a68a13 , such as those of Jiang and Conrath (1997) and Lin (1998), consider both the (lowest) common ancestor as well as the two nodes of interest in their calculation.</Paragraph>
      <Paragraph position="2"> The complete bipartite graph considered in section 3.2 directly connects each node s in profile a141 to node a191 in profile a142 , eliminating the typical A-shaped path in an ontology. This structure solves the  non-additivity issue, by generating an edge with the exact concept-to-concept distance for each potential node comparison, but, as noted above, is too inefficient. Our solution here is to construct a network that uses the idea of a pared-down A-shaped path to mostly avoid non-additivity, but without the inefficiency of the complete bipartite graph. Thus, as explained in more detail in the following subsections, we trade off the exactness of the distance calculation against the efficiency of the network construction.</Paragraph>
    </Section>
    <Section position="2" start_page="100" end_page="100" type="sub_section">
      <SectionTitle>
4.2 Network Construction
</SectionTitle>
      <Paragraph position="0"> In our network construction, we exploit the general notion of an A-shaped path between any two nodes, but replace the &amp;quot;tip&amp;quot; of the A with two nodes. Then for each node a189 and a191 in profiles a141 and a142 , we generate an edge from s to an ancestor a204a74a205 of a189 (the left &amp;quot;branch&amp;quot; of the A), an edge from d to an ancestor a204a138a206 of a191 (the right &amp;quot;branch&amp;quot; of the A), and an edge between a204a116a205 and a204a138a206 (the two nodes forming the &amp;quot;elongated tip&amp;quot; of the A). Each edge has the exact concept-to-concept distance from the original network, so that the distance between any two nodes a189 and a191 is the sum of three exact distances.</Paragraph>
      <Paragraph position="1"> The set of ancestor nodes, a204a124a205 and a204a138a206 , comprise the &amp;quot;junction&amp;quot; points at which the supply from a141 can be transported across to the nodes in a142 to satisfy their demand. The set of junction nodes, a207a32a208 , for a profile a209 , must be selected such that for each node a24 in a209 , a207a190a208 contains at least one ancestor of a24 . (See section 4.4 for details on the junction selection process.) The resulting network is constructed by directly connecting each profile to its corresponding junction, then connecting the two junctions in the middle (Figure 2(c)).</Paragraph>
      <Paragraph position="2"> The difference between the complete bipartite network and the transformed network here is that, instead of connecting each node in a141 to every node in a142 , we connect each node in a207a116a210 to every node in a207a81a211 . Compare the transformed network in Figure 2(c) with the complete bipartite network in Figure 2(b). The complete bipartite component in the transformed network (the middle portion between the junction nodes labelled a207 a210 and a207a138a211 ) is considerably smaller in size. Thus, the number of edges in the transformed network is significantly fewer as well.</Paragraph>
      <Paragraph position="3"> Next, we can proceed to define the cost function on the transformed network. Observe that each edge a5a66a189a190a9a78a191a138a13 , with cost a212a28a146a102a213a104a203a121a5a173a213a28a9a113a212a124a13 , in the complete bipartite network, where a189a14a25a126a141 , a191a214a25a134a142 , is now instead represented by three edges: a5a173a213a59a9a113a215a59a216a121a13 , a5a68a215a28a216a187a9a113a215a28a217a218a13 , and a5a68a215a59a217a81a9a113a212a124a13 , where a215 a216 a25a219a207a81a210 and a215a59a217a84a25a219a207 a211 . Thus, the transformed distance between a189 and a191 , a212a28a146a102a213a104a203a66a220a171a221a102a222a68a223a121a216a162a5a173a213a59a9a113a212a124a13 , becomes:</Paragraph>
      <Paragraph position="5"> where a191a218a24a30a189a187a231a121a5a29a24a78a9a80a79a81a13 is the precise concept-to-concept distance between a24 and a79 in the original network.</Paragraph>
      <Paragraph position="6"> Once we have set up the transformed network, we can solve the MCF in this network, yielding the distance between the two (supply and demand) profiles.</Paragraph>
    </Section>
    <Section position="3" start_page="100" end_page="101" type="sub_section">
      <SectionTitle>
4.3 Distance Distortion
</SectionTitle>
      <Paragraph position="0"> Because the distance between nodes a189 and a191 is now calculated as the sum of three distances (eqn. (6)), some distortion may result for non-additive concept-to-concept distances. To illustrate the distortion effect, consider Jiang and Conrath's (1997) distance:</Paragraph>
      <Paragraph position="2"> where a69a28a241a14a5a30a146a147a13 is the information content of a node a24 , and a242a86a241a106a243a244a5a30a146a137a9a66a245a138a13 is the lowest common subsumer of nodes a24 and a79 . This distance measures the difference in information content between the concepts and their lowest common subsumers.</Paragraph>
      <Paragraph position="3"> After the transformation, the distance is distorted in the following way. If a24 and a79 have no common junction ancestor, then a212a28a146a61a213a119a203a118a246a80a247 a224a183a225a171a226  where the term a2 a69a28a241 a5a8a242 a241a106a243 a5a68a215a59a70a147a9a113a215a78a246a77a13a68a13 in eqn. (8) is replaced by a2 a69a28a241a14a5a68a215a138a13 . In either case, the transformation replaces the lowest common subsumer a242a86a241a106a243a244a5a30a146a137a9a66a245a138a13 in eqn. (7) with some other common subsumer a241a106a243 a5a30a146a12a9a66a245a138a13 (a242a86a241a106a243a244a5a68a215 a70 a9a113a215 a246 a13 or a204 , mentioned above). Unless a241a106a243 a5a30a146a12a9a66a245a138a13a140a3a168a242 a241a106a243 a5a30a146a12a9a66a245a138a13 , the distance is distorted by using a less precise quantity, a69a28a241 a5a121a241a106a243a244a5a30a146a137a9a66a245a138a13a68a13 . Note that the information content of a concept is given by its maximum likelihood estimate based on  its frequency in a large corpus. An increment in the frequency of a concept leads to an increment in the frequency of all its ancestors. Due to the frequency percolation, concepts with a small depth tend to accumulate higher counts than those deeper in the hierarchy (note the difference in depth: a212a190a202a173a196a63a203a5a4a7a6 a144 a48 a70 a52a246 a54</Paragraph>
      <Paragraph position="5"> a54 ). Thus, we expect the information content of a concept to be higher than its ancestors, i.e., a concept is more semantically specific than its ancestors, which is captured by the use of the negative a193a55a194a59a195 function in the definition of IC. The transformed distance is distorted accordingly (a69a28a241 a5a121a241a106a243a244a5a30a146a137a9a66a245a138a13a68a13 a8 a69a28a241a14a5a8a242a86a241a106a243a244a5a30a146a137a9a66a245a138a13a68a13 ).</Paragraph>
    </Section>
    <Section position="4" start_page="101" end_page="101" type="sub_section">
      <SectionTitle>
4.4 Junction Selection
</SectionTitle>
      <Paragraph position="0"> Selection of junction nodes is a key component of the network transformation. Trivially, a junction consisting of profile nodes yields a network equivalent to the complete bipartite network. The key is to select a junction that is considerably smaller in size than its corresponding profile, hence, cutting down the number of edges generated, which results in significant savings in complexity.</Paragraph>
      <Paragraph position="1"> Note that there is a tradeoff between the over-all computational efficiency and the similarity between the transformed network and the complete bipartite network. The closer the junctions are to the corresponding profiles, the closer the transformed network resembles the complete bipartite network.</Paragraph>
      <Paragraph position="2"> Though the distance calculation is more accurate, such a network is also more expensive to process.</Paragraph>
      <Paragraph position="3"> On the other hand, there are fewer nodes in a junction as it approaches the root level, but there is more distortion in the transformed concept-to-concept distance. Clearly, it is important to balance the two factors. null Selecting junction nodes involves finding a smaller set of ancestor nodes representing the profile nodes in a hierarchy. In other words, the junction can be viewed as an alternative representation which is a generalization of the profile nodes. In addition to the profile nodes, the junction nodes are also included in the transformed network. They may provide extra information about the corresponding context.</Paragraph>
      <Paragraph position="4"> Finding a generalization of a profile is explored in the works of Clark and Weir (2002) and Li and Abe (1998). Unfortunately, the complexity of these algorithms is quadratic (the former) or cubic (the latter) in the number of nodes in a network, which is unacceptably expensive for our transformation method.</Paragraph>
      <Paragraph position="5"> Note that to ensure every profile node has an ancestor node in the junction, the selection process has a linear lower bound. To keep the cost low, it is best to keep a linear complexity for the junction selection process. However, if this is not possible, it should be significantly less expensive than a quadratic complexity. We will empirically explore the process further in section 5.3.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="101" end_page="102" type="metho">
    <SectionTitle>
5 Context Comparison
</SectionTitle>
    <Paragraph position="0"> As alluded to earlier, our network flow method provides an alternative to a purely distributional and non-graphical approach to context comparison. In this paper, we will test both variants of our method (with or without the transformation in section 4) in a name disambiguation task in which the context words within a small window surrounding the ambiguous words are compared. Our preliminary analysis shows that our general network flow framework is robust and efficient.</Paragraph>
    <Section position="1" start_page="101" end_page="101" type="sub_section">
      <SectionTitle>
5.1 Name Disambiguation
</SectionTitle>
      <Paragraph position="0"> The goal for name disambiguation is to classify each ambiguous instance on the basis of its surrounding context. One approach is to use an unsupervised method such as clustering. This involves making a large number of pairwise comparisons between individual contexts. Given that there is an overhead to incorporating ontological information, our network flow method does not compute distances as efficiently as calculating a purely arithmetic distance such as cosine or Euclidean distance. Our alternative approach is to use minimal training data. Using a handful of contexts, we can build a &amp;quot;gold standard&amp;quot; profile for each sense of an ambiguous name by using the context words of a small number of instances. We then compare the context profile of each instance to the gold standards. Each instance is given the label of the gold standard profile to which its context profile is the closest.</Paragraph>
    </Section>
    <Section position="2" start_page="101" end_page="102" type="sub_section">
      <SectionTitle>
5.2 Experimental Setup
</SectionTitle>
      <Paragraph position="0"> In our name disambiguation experiment, we use the data collected by Pedersen et al. (2005) for their name discrimination task. This data is taken from  name. &amp;quot;200&amp;quot; and &amp;quot;100&amp;quot; give the averaged results (over five different runs) using 200 and 100 randomly selected training instances per ambiguous name. The weighted average is calculated based on the number of test instances per task. &amp;quot;Full&amp;quot; and &amp;quot;Trans&amp;quot; refer to the results using the full network (pre-transformation) or the pared-down network (with transformation), respectively. the Agence France Press English Service portion of the GigaWord English corpus distributed by the Linguistic Data Consortium. It consists of the contexts of six pairs of names, including: the names of two soccer players (Ronaldo and David Beckham); an ethnic group and a diplomat (Tajik and Rolf Ekeus); two companies (Microsoft and IBM); two politicians (Shimon Peres and Slobodan Milosevic); a nation and a nationality (Jordan and Egyptian); and two countries (France and Japan). These name pairs are selected by Pedersen et al. (2005) to reflect a range of confusability between names.</Paragraph>
      <Paragraph position="1"> Each pair of names serves as one of six name disambiguation tasks. Each name instance consists of a context window of 50 words (25 words to the left and to the right of the target name), with the target name obfuscated. For example, for the task of distinguishing &amp;quot;David Beckham&amp;quot; and &amp;quot;Ronaldo&amp;quot;, the target name in each instance becomes &amp;quot;David BeckhamRonaldo&amp;quot;. The goal is to recover the correct target name in each instance.</Paragraph>
    </Section>
    <Section position="3" start_page="102" end_page="102" type="sub_section">
      <SectionTitle>
5.3 Junction Selection
</SectionTitle>
      <Paragraph position="0"> We reported earlier that a complete bipartite graph with 900 nodes is too expensive to process. Our first attempt is to select a junction on the basis of the number of nodes it contains. Here, the junctions we select are simple to find by taking a top-down approach. We start at the top nine root nodes of Word-Net (nodes of zero depth) and proceed downwards.</Paragraph>
      <Paragraph position="1"> We limit the search within the top two levels because the second level consists of 158 nodes, while the following level consists of 1307 nodes, which, clearly, exceeds 900 nodes. Here, we select the junction which consists of eight of the top root nodes (silbings of entity) and the children of entity, given that entity is semantically more general than its siblings.3 In our current experiment, we use Jiang and Conrath's distance for its ease of analysis. As shown in section 4.3, only one term in the distance, a69a28a241a14a5a8a242a86a241a106a243 a5a30a146a12a9a66a245a138a13a68a13 , is replaced because of the use of the junction nodes. Any change in the performance (in comparison to our method without the transformation) can be attributed to the distance distortion as a result of this term being replaced. The analysis of experimental results (next section) is made easy because we can assess the goodness of the transformation given the selected junction--a significant degradation in performance is an indication that the junction nodes should be brought closer to the profile nodes, yielding a more precise distance.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML