File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1018_metho.xml

Size: 14,219 bytes

Last Modified: 2025-10-06 14:15:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1018">
  <Title>Ordering Among Premodifiers</Title>
  <Section position="4" start_page="135" end_page="138" type="metho">
    <SectionTitle>
3 Methodology
</SectionTitle>
    <Paragraph position="0"> In this section, we discuss how we obtain the premodifier sequences from the corpus for analysis and the three approaches we use for establishing ordering relationships: direct corpus evidence, transitive closure, and clustering analysis. The result of our analysis is embodied in a  function, compute_order(A, B), which returns the sequential ordering between two premoditiers, word A and word B.</Paragraph>
    <Paragraph position="1"> To identify orderings among premodifiers, premodifier sequences are extracted from simplex NPs. A simplex NP is a maximal noun phrase that includes premodifiers such as determiners and possessives but not post-nominal constituents such as prepositional phrases or relative clauses. We use a part-of-speech tagger \[Brill 1992\] and a finite-state grammar to extract simplex NPs. The noun phrases we extract start with an optional determiner (DT) or possessive pronoun (PRP$), followed by a sequence of cardinal numbers (CDs), adjectives (JJs), nouns (NNs), and end with a noun. We include cardinal numbers in NPs to capture the ordering of numerical information such as age and amounts. Gerunds (tagged as VBG) or past participles (tagged as VBN), such as &amp;quot;heated&amp;quot; in &amp;quot;heated debate&amp;quot;, are considered as adjectives if the word in front of them is a determiner, possessive pronoun, or adjective, thus separating adjectival and verbal forms that are conflared by the tagger. A morphology module transforms plural nouns and comparative and superlative adjectives into their base forms to ensure maximization of our frequency counts.</Paragraph>
    <Paragraph position="2"> There is a regular expression filter which removes obvious concatenations of simplex NPs such as &amp;quot;takeover bid last week&amp;quot; and &amp;quot;Tylenol 40 milligrams&amp;quot;.</Paragraph>
    <Paragraph position="3"> After simplex NPs are extracted, sequences of premodifiers are obtained by dropping determiners, genitives, cardinal numbers and head nouns. Our subsequent analysis operates on the resulting premodifier sequences, and involves three stages: direct evidence, transitive closure, and clustering. We describe each stage in more detail in the following subsections.</Paragraph>
    <Section position="1" start_page="136" end_page="137" type="sub_section">
      <SectionTitle>
3.1 Direct Evidence
</SectionTitle>
      <Paragraph position="0"> Our analysis proceeds on the hypothesis that the relative order of two premodifiers is fixed and independent of context. Given two premodifiers A and B, there are three possible underlying orderings, and our system should strive to find which is true in this particular case: either A comes before B, B comes before A, or the order between A and B is truly unimportant. Our first stage relies on frequency data collected from a training corpus to predict the order of adjective and noun premodifiers in an unseen test corpus.</Paragraph>
      <Paragraph position="1"> To collect direct evidence on the order of premodifiers, we extract all the premodifiers from the corpus as described in the previous subsection. We first transform the premoditier sequences into ordered pairs. For example, the phrase &amp;quot;well-known traditional brand-name drug&amp;quot; has three ordered pairs, &amp;quot;well-known -&lt; traditional&amp;quot;, &amp;quot;well-known -~ brand-name&amp;quot;, and &amp;quot;traditional -~ brand-name&amp;quot;. A phrase with n premodifiers will have (~) ordered pairs. From these ordered pairs, we construct a w x w matrix Count, where w the number of distinct modifiers. The cell \[A, B\] in this matrix represents the number of occurrences of the pair &amp;quot;A -~ B&amp;quot;, in that order, in the corpus.</Paragraph>
      <Paragraph position="2"> Assuming that there is a preferred ordering between premodifiers A and B, one of the cells Count\[A,B\] and Count\[B,A\] should be much larger than the other, at least if the corpus becomes arbitrarily large. However, given a corpus of a fixed size there will be many cases where the frequency counts will both be small. This data sparseness problem is exacerbated by the inevitable occurrence of errors during the data extraction process, which will introduce some spurious pairs (and orderings) of premodifiers.</Paragraph>
      <Paragraph position="3"> We therefore apply probabilistic reasoning to determine when the data is strong enough to decide that A -~ B or B -~ A. Under the null hypothesis that the two premoditiers order is arbitrary, the number of times we have seen one of them follows the binomial distribution with parameter p -- 0.5. The probability that we would see the actually observed number of cases with A ~ B, say m, among n pairs involving A and  If this probability is low, we reject the null hypothesis and conclude that A indeed precedes (or follows, as indicated by the relative frequencies) B.</Paragraph>
    </Section>
    <Section position="2" start_page="137" end_page="137" type="sub_section">
      <SectionTitle>
3.2 Transitivity
</SectionTitle>
      <Paragraph position="0"> As we mentioned before, sparse data is a serious problem in our analysis. For example, the matrix of frequencies for adjectives in our training corpus from the medical domain is 99.8% empty--only 9,106 entries in the 2,232 x 2,232 matrix contain non-zero values. To compensate for this problem, we explore the transitive properties between ordered pairs by computing the transitive closure of the ordering relation. Utilizing transitivity information corresponds to making the inference that A -&lt; C follows from A -~ B and B -&lt; C, even if we have no direct evidence for the pair (A, C) but provided that there is no contradictory evidence to this inference either. This approach allows us to fill from 15% (WSJ) to 30% (medical corpus) of the entries in the matrix.</Paragraph>
      <Paragraph position="1"> To compute the transitive closure of the order relation, we map our underlying data to special cases of commutative semirings \[Pereira and Riley 1997\]. Each word is represented as a node of a graph, while arcs between nodes correspond to ordering relationships and are labeled with elements from the chosen semiring. This formalism can be used for a variety of problems, using appropriate definitions of the two binary operators (collection and extension) that operate on the semiring's elements. For example, the all-pairs shortest-paths problem in graph theory can be formulated in a rain-plus semiring over the real numbers with the operators rain for collection and + for extension. Similarly, finding the transitive closure of a binary relation can be formulated in a max-rain semi-ring or a or-and semiring over the set {0, 1}. Once the proper operators have been chosen, the generic Floyd-Warshall algorithm \[Aho et al. 1974\] can solve the corresponding problem without modifications. null We explored three semirings appropriate to our problem. First, we apply the statistical decision procedure of the previous subsection and assign to each pair of premodifiers either 0 (if we don't have enough information about their preferred ordering) or 1 (if we do). Then we use the or-and semiring over the {0,1} set; in the transitive closure, the ordering A -~ B will be present if at least one path connecting A and B via ordered pairs exists. Note that it is possible for both A -~ B and B -~ A to be present in the transitive closure.</Paragraph>
      <Paragraph position="2"> This model involves conversions of the corpus evidence for each pair into hard decisions on whether one of the words in the pair precedes the other. To avoid such early commitments, we use a second, refined model for transitive closure where the arc from A to B is labeled with the probability that A precedes indeed B.</Paragraph>
      <Paragraph position="3"> The natural extension of the ({0, 1}, or, and) semiring when the set of labels is replaced with the interval \[0, 1\] is then (\[0, 1\], max, rain). We estimate the probability that A precedes B as one minus the probability of reaching that conclusion in error, according to the statistical test of the previous subsection (i.e., one minus the sum specified in equation (2). We obtained similar results with this estimator and with the maximal likelihood estimator (the ratio of the number of times A appeared before B to the total number of pairs involving A and B).</Paragraph>
      <Paragraph position="4"> Finally, we consider a third model in which we explore an alternative to transitive closure.</Paragraph>
      <Paragraph position="5"> Rather than treating the number attached to each arc as a probability, we treat it as a cost, the cost of erroneously assuming that the corresponding ordering exists. We assign to an edge (A, B) the negative logarithm of the probability that A precedes B; probabilities are estimated as in the previous paragraph. Then our problem becomes identical to the all-pairs shortest-path problem in graph theory; the corresponding semiring is ((0, +c~), rain, +). We use logarithms to address computational precision issues stemming from the multiplication of small probabilities, and negate the logarithms so that we cast the problem as a minimization task (i.e., we find the path in the graph the minimizes the total sum of negative log probabilities, and therefore maximizes the product of the original probabilities).</Paragraph>
    </Section>
    <Section position="3" start_page="137" end_page="138" type="sub_section">
      <SectionTitle>
3.3 Clustering
</SectionTitle>
      <Paragraph position="0"> As noted earlier, earlier linguistic work on the ordering problem puts words into semantic classes and generalizes the task from ordering between specific words to ordering the corresponding classes. We follow a similar, but evidence-based, approach for the pairs of words that neither direct evidence nor transitivity can resolve. We compute an order similarity measure between any two premodifiers, reflecting whether the two words share the same pat- null tern of relative order with other premodifiers for which we have sufficient evidence. For each pair of premodifiers A and B, we examine every other premodifier in the corpus, X; if both A -~ X and B -~ X, or both A ~- X and B ~- X, one point is added to the similarity score between A and B. If on the other hand A -~ X and B ~- X, or A ~- X and B -~ X, one point is subtracted. X does not contribute to the similarity score if there is not sufficient prior evidence for the relative order of X and A, or of X and B.</Paragraph>
      <Paragraph position="1"> This procedure closely parallels non-parametric distributional tests such as Kendall's T \[Kendall 1938\].</Paragraph>
      <Paragraph position="2"> The similarity scores are then converted into dissimilarities and fed into a non-hierarchical clustering algorithm \[Sp~th 1985\], which separates the premodifiers in groups. This is achieved by minimizing an objective function, defined as the sum of within-group dissimilarities over all groups. In this manner, premoditiers that are closely similar in terms of sharing the same relative order with other premodifiers are placed in the same group.</Paragraph>
      <Paragraph position="3"> Once classes of premodifiers have been induced, we examine every pair of classes and decide which precedes the other. For two classes C1 and C2, we extract all pairs of premodifiers (x, y) with x E C1 and y E C2. If we have evidence (either direct or through transitivity) that x -~ y, one point is added in favor of C1 -~ C2; similarly, one point is subtracted if x ~- y. After all such pairs have been considered, we can then predict the relative order between words in the two clusters which we haven't seen together earlier. This method makes (weak) predictions for any pair (A, B) of words, except if (a) both A and B axe placed in the same cluster; (b) no ordered pairs (x, y) with one element in the class of A and one in the class of B have been identified; or (c) the evidence for one class preceding the other is in the aggregate equally strong in both directions.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="138" end_page="138" type="metho">
    <SectionTitle>
4 The Corpus
</SectionTitle>
    <Paragraph position="0"> We used two corpora for our analysis: hospital discharge summaries from 1991 to 1997 from the Columbia-Presbyterian Medical Center, and the January 1996 part of the Wall Street Journal corpus from the Penn TreeBank \[Marcus et al. 1993\]. To facilitate comparisons across the two corpora, we intentionally limited ourselves to only one month of the WSJ corpus, so that approximately the same amount of data would be examined in each case. The text in each corpus is divided into a training part (2.3 million words for the medical corpus and 1.5 million words for the WSJ) and a test part (1.2 million words for the medical corpus and 1.6 million words for the WSJ).</Paragraph>
    <Paragraph position="1"> All domain-specific markup was removed, and the text was processed by the MXTERMINATOR sentence boundary detector \[Reynar and Ratnaparkhi 1997\] and Brill's part-of-speech tagger \[Brill 1992\]. Noun phrases and pairs of pre-modifiers were extracted from the tagged corpus according to the methods of Section 3. From the medical corpus, we retrieved 934,823 simplex NPs, of which 115,411 have multiple pre-modifiers and 53,235 multiple adjectives only. The corresponding numbers for the WSJ corpus were 839,921 NPs, 68,153 NPs with multiple premodifiers, and 16,325 NPs with just multiple adjectives.</Paragraph>
    <Paragraph position="2"> We separately analyze two groups of premoditiers: adjectives, and adjectives plus nouns modifying the head noun. Although our techniques are identical in both cases, the division is motivated by our expectation that the task will be easier when modifiers are limited to adjectives, because nouns tend to be harder to match correctly with our finite-state grammar and the input data is sparser for nouns.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML