File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/h91-1037_metho.xml
Size: 21,294 bytes
Last Modified: 2025-10-06 14:12:41
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1037"> <Title>Partial Parsing: A Report on Work in Progress</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> TOWN OF YUNGUYO, NEAR THE LAKE, VERY NEAR WHERE THE PRESIDENTIAL SUMMIT WAS TO TAKE </SectionTitle> <Paragraph position="0"> PLACE.</Paragraph> <Paragraph position="1"> In a task such as MUC-3 the goal is to identify pre-defined classes of entities, e.g., terrorist events, and dates, and the relationships among them, e.g., the perpetrator of a given terrorist act. Below, we have listed the first seven of nineteen pre-specified classes of data to be extracted from the messages of MUC-3. 0. MESSAGE ID: identifier 1. TEMPLATE ID: identifier 2. DATE OF INCIDENT: date 3. TYPE OF INCIDENT: set element e.g., KIDNAPPING,</Paragraph> </Section> <Section position="4" start_page="0" end_page="207" type="metho"> <SectionTitle> ATI'EMPTED KIDNAPPING, KIDNAPPING THREAT .... 4. CATEGORY OF INCIDENT: set element, e.g, TERRORIST ACT, STATE-SPONSORED VIOLENCE </SectionTitle> <Paragraph position="0"> 5. PERPETRATOR: ID OF INDIV(S): a string 6. PERPETRATOR: ID OF ORG(S): a string</Paragraph> <Section position="1" start_page="0" end_page="204" type="sub_section"> <SectionTitle> System Architecture Assumptions </SectionTitle> <Paragraph position="0"> For the purposes of this discussion, we assume a fairly standard system architecture as shown in Figure 1 below. We further assume that a domain model exists for the pre-specified data to be extracted. That is, every class of entides of importance is specified in a frame representation indicating subclass-superclass relationships and all other important binary relationships among them.</Paragraph> <Paragraph position="1"> Processing sentences to the point of finding semantic interpretations is the topic of this paper. Pilot experiments are reported here on alternative algorithms to find interpretable fragments even when no global syntactic or semantic analysis can be found. We are particularly exploring probabilistic models for this processing, and have described experiments with various probabilistie models elsewhere (Aynso, et al, 1990; Meteer, et al., 1991). The discourse component has two roles. One is resolving references. The second role is to use hypotheses regarding what domain-specific events are being described in each paragraph or article. Particular events correspond to templates. Once a template has been hypothesized, and once all text has been processed, if the template requires (or expects) certain information that has not yet been found, the discourse processor looks for values of the right semantic type and plausible within the discourse structure of the article. This process will be described elsewhere.</Paragraph> <Paragraph position="2"> The output templates consist of three types of fields: set fill (a pre-specified finite set of alternatives), string (a literal, uninterpreted string from the input), and denumerably infinite entities (integers, dates, identifiers, etc.).</Paragraph> <Paragraph position="3"> individuals, and organizations of primary interest in the domain. Normally these entities appear as noun phrases in the text.</Paragraph> <Paragraph position="4"> Therefore, a basic concern is to reliably identify noun phrases that denote entities of interest, even if neither full syntactic nor full semantic analysis is possible.</Paragraph> <Paragraph position="5"> Two of our experiments have focussed on the identification of core noun phrases, a primary way of expressing entities in text. A core NP is defined syntactically as the maximal simple noun phrase, i.e., the largest one containing no post-modifiers. Here are some examples of core NPs (marked by italics) within their full noun phrases: a joint venture with the Chinese government to build an automobile-parts assembly plant a $50.9 million loss from discontinued operations in the third quarter because of the proposed sale Such complex, full NPs require too many linguistic decisions to be directly processed without detailed syntactic and semantic knowledge about each word, an assumption which need not be true for open-ended text.</Paragraph> <Paragraph position="6"> We tested two differing algorithms on text from the Wall Street Journal (WSJ). Using BBN's part of speech tagger (POST), tagged text was parsed using the full unification grammar of Delphi to fred only core NPs, 695 in 100 sentences. Hand-scoring of the results indicated that 85% of the core NPs were identified correctly.</Paragraph> <Paragraph position="7"> Subsequent analysis suggested that half the errors could be removed with only a little additional work, suggesting that over 90% performance is achievable.</Paragraph> <Paragraph position="8"> In a related test, we explored the bracketings produced by Church's PARTS program (Church, 1988). We extracted 200 sentences of WSJ text by taking every tenth sentence from a collection of manually corrected parse trees (data from the TREEBANK Project at the University of Pennsylvania). We evaluated the NP bracketings in these 200 sentences by hand, and tried to classify the errors. Of 1226 phrases in the 200 sentences, 131 were errors, for a 10.7% error rate. The errors were classified by hand as follows: * Two consecutive but unrelated phrases grouped as one: 10 * Phrase consisted of a single word, which was not an NP: 70 * Missed phrases (those that should have been bracketed but were not): 12 * Ellided head (e.g. part of a conjoined premodifier to an NP): 4 * Missed premodffiers: 4 * Head of phrase was verb form that was missed: 4 * Other: 27 The 90% success rate in both tests suggests that identification of core NPs can be achieved using only local information and with minimal knowledge of the words. Next we consider the issue of what semantics should be assigned and how reliably that can be accomplished.</Paragraph> </Section> <Section position="2" start_page="204" end_page="206" type="sub_section"> <SectionTitle> Finding Core Noun Phrases </SectionTitle> <Paragraph position="0"> In a task such as MUC-3 one fundamental application goal is to identify pre-defined classes of entities, e.g., dates, locations, Semantics of Core Noun Phrases In trying to extract pre-specified data from open-ended text such as a newswire, it is clear that full semantic interpretation of such texts is not on the horizon. However, our hypothesis is that it need not be for automatic data base update. The type of information to be extracted permits some partial understanding. For semantic processing, minimally, for each noun phrase (NP), one would like to identify the class in the domain model that is the smallest pre-defined class containing the NPs denotation. For each clause, one would like to identify the corresponding event class or state of affairs denoted.</Paragraph> <Paragraph position="1"> Our pilot experiment focussed on the reliability of identifying the minimal class for each noun phrase.</Paragraph> <Paragraph position="2"> Assigning a semantic class to a core noun phrase can be handled via some structural rules. Usually the semantic class of the head word is correct for the semantic class not only of the core noun phrase but also of the complete noun phrase it is part of.</Paragraph> <Paragraph position="3"> Additional rules cover exceptions, such as &quot;set of ...&quot;. These heuristics correctly predicted the semantic class of the whole noun phrase 99% of the time in the sample of over 1000 noun phrases from WSJ that were correctly predicted by Church's PARTS program.</Paragraph> <Paragraph position="4"> Furthermore, even some of the NP's whose left boundary was not predicted correctly by PARTS, nevertheless were assigned the correct semantic class. One consequence of this is that the correct semantic class of a complex noun phrase can be predicted even if some of the words in the noun phrase are unknown and even ff its full structure is unknown. Thus, fully correct identification of core noun phrase boundaries and noun phrase boundaries may not be necessary to accurately produce data base updates.</Paragraph> <Paragraph position="5"> Though finding the entities of interest is fundamental to the task, finding relationships of interest among them is also critical. For instance, in MUC-3 one must identify terrorist events in any of nine Latin American countries, the perpetrators of the event, the victims, if any, the date, the location, any structural damage, and so on.</Paragraph> <Paragraph position="6"> The experiments reported above were run by mid-summer, 1990.</Paragraph> <Paragraph position="7"> In fall, 1990, a more complete alternative, the MIT Fast Parser (MITFP), became available to us. It finds fragments using a stochastic part of speech algorithm and a nearly deterministic parser. It produces fragments averaging 3-4 words in length. An example output follows.</Paragraph> <Paragraph position="8"> Certain sequences of fragments appear frequently, as illustrated in the tables below. One frequently occurring pair is an S followed by a PP (prepositional phrase). Since there is more than one way the parser could attach the PP, and syntactic grounds alone for attaching the PP would yield poor performance, semantic preferences applied by a post-process that combines fragments are called for.</Paragraph> <Paragraph position="9"> In our approach, the first step is to compute a semantic interpretation for each fragment found without assuming that the meaning of each word is known. For instance, as described above, the semantic class for any noun phrase can be computed provided the head noun has semantics in the domain.</Paragraph> <Paragraph position="10"> Based on the data above, a reasonable approach is an algorithm that moves left-to-right through the set of fragments produced by M1TFP, deciding to attach fragments (or not) based on semantic criteria. To avoid requiring a complete, global analysis, a window two constituents wide is used to fred patterns of possible relations among phrases. For example, an S followed by a PP invokes an action of finding all points along the &quot;right edge&quot; of the S tree where a PP could attach, applying the fragment combining patterns at each such spot, and ranking the alternatives.</Paragraph> <Paragraph position="11"> As evident in Table 2, MITFP frequently does not attach punctuation. This is to be expected, since punctuation is used in many ways, and there is no deterministic basis grounds for attaching the constituent following the punctuation to the constituent preceding it. Therefore, if the pair being examined by the combining algorithms ends in punctuation, the algorithm looks at the constituent following it, trying to combine it with the constituent left of the punctuation.</Paragraph> <Paragraph position="12"> A similar case is when the pair ends in a conjunction. Here the algorithm tries to combine the constituent to the right of the conjunction with that on the left of the conjunction.</Paragraph> </Section> <Section position="3" start_page="206" end_page="206" type="sub_section"> <SectionTitle> Learning Semantic Information </SectionTitle> <Paragraph position="0"> Since the norm will be that there are several ways to combine a pair of fragments, we plan to test several alternative heuristics for ranking the alternatives. Probabilistic methods seem particularly powerful and appropriate. Thus far, we have tested this hypothesis on propositional phrase attachment.</Paragraph> <Paragraph position="1"> Such semantic kffowledge called selection restrictions or case frames governs what phrases make sense with a particular verb or noun (what arguments go with a particular verb or noun).</Paragraph> <Paragraph position="2"> Traditionally such semantic knowledge is handcrafted, though some software aids exist to enable greater productivity (Ayuso et al., 1987; Bates, 1989; Grishman et al., 1986; Weischedel, et al., 1989).</Paragraph> <Paragraph position="3"> Instead of handrafting this semantic knowledge, our goal is to learn that knowledge from examples, using a three step process: 1. Simple manual semantic annotation, 2. Supervised training based on parsed sentences, 3. Estimation of probabilities.</Paragraph> </Section> <Section position="4" start_page="206" end_page="207" type="sub_section"> <SectionTitle> Simple Manual Semantic Annotation </SectionTitle> <Paragraph position="0"> Given a sample of text, we annotate each noun, verb, and proper noun in the sample with the semantic class corresponding to it in the domain modal. For instance, dawn would be annotated <time>, explode would be <explosion event>, and Yunguyo would be <city>. For our experiment, 560 nouns and 170 verbs were defined in this way. We estimate that this semantic annotation proceeded at about 90 words/hour.</Paragraph> <Paragraph position="1"> Supervised Training From the TREEBANK project at the University of Pennsylvania, we used 20,000 words of MUC-3 texts that had been bracketed according to major syntactic category. The bracketed constituents for the sentence used in Figure 2 appears in Figure 3 below.</Paragraph> <Paragraph position="2"> From the example one can clearly infer that bombs can explode, or more properly, that bomb can be the logical subject of explode, that at dawn can modify explode, etc. Naturally good generalizations based on the instances are more valuable than the instances themselves.</Paragraph> <Paragraph position="3"> Since we have a hierarchical domain model, and since the manual semantic annotation states the relationship between lexical items and concepts in the domain model, we can use the domain model hierarchy as a given set of categories for generalization. However, the critical issue is selecting the right level of generalization given the set of examples in the supervised training set.</Paragraph> <Paragraph position="4"> We have chosen a known statistical procedure (Katz, 1987) that selects the minimum level of generalization such that there is sufficient data in the training set to support discrimination of cases of attaching phrases (arguments) to their head. This leads us to the next topic, estimation of probabilities from the supervised training set.</Paragraph> <Paragraph position="5"> Estimation of Probabilities The case relation, or selection restriction, to be learned is of the form X P O, where X is a head word or its semantic class; P is a case, e.g., logical subject, logical object, a preposition, etc.; and O is a head word or its semantic class.</Paragraph> <Paragraph position="6"> One factor in the probability that O attaches to X with case P is p' (X I P, O), an estimate of the likelihood of X given P and O. We chose to model a second factor p(d)l, the probability of an attachment where d words separate the head word X from the phrase to be attached (intuitively, the notion of attachment distance).</Paragraph> <Paragraph position="7"> Since a 20,000 word corpus is not much data, we used a generalization algorithm (Katz, 1987) to automatically move up the hierarchical domain model from X to its parent, and from O to its parent.</Paragraph> <Paragraph position="8"> The Experiment By examining the table of triples X P O that were learned, it was clear that meaningful information was induced from the examples. For instance, \[<attack> against <building>\] and \[<attack> against <residence>\] were learned, which correspond to two cases of importance in the MUC domain.</Paragraph> <Paragraph position="9"> However, we ran a far more meaningful evaluation of what was learned by measuring how effective the learned information would be at predicting 166 prepositional phrase attachments that were not made by the MITFP. For example, in Figure 1, fragment 2 could be attached syntactically to fragment 1 at three places: modifying dawn, modifying today, or modifying explode.</Paragraph> <Paragraph position="10"> Closest attachment, a purely syntactic constraint, worked quite effectively, having a 25% error rate. Using the semantic probabilities alone p' (X I P, O) had poorer performance, a 34% error rate. However, the richer probability model p' (X I P, O) * p(d) outperformed beth the purely semantic model and the purely syntactic model (closest attachment), yielding an 18% error rate. As a consequence, useful semantic information was learned by the training algorithm.</Paragraph> <Paragraph position="11"> However, the degree of reduction of error rate should not be taken as the final word, for the following reasons: 20,000 words of training data is much less than one would want. An additional 70,000 words of training data should soon be available through TREEBANK.</Paragraph> <Paragraph position="12"> Since many of the head words in the 20,000 word corpus are not of import in the MUC-3 domain, their semantic type is vague, i.e., <unknown event>, <unknown entity>, etc.</Paragraph> </Section> <Section position="5" start_page="207" end_page="207" type="sub_section"> <SectionTitle> Related Work </SectionTitle> <Paragraph position="0"> In addition to the work discussed earlier on tools to increase the portability of natural language systems, another recent paper (Hindle and Rooth, 1990) is directly related to our goal of inferring case frame information from examples.</Paragraph> <Paragraph position="1"> Hindle and Rooth focussed only on prepositional phrase attachment using a probabilistic model, whereas our work applies to all case relations. Their work used an unsupervised training corpus of 13 million words to judge the strength of prepositional affinity to verbs, e.g., how likely it is for to to attach to the word go, for from to attach to the word leave, or for to to attach to the word flight. This lexical affinity is measured independent of the object of the preposition. By contrast, we are exploring induction of semantic relations from supervised training, where very little training may be available. Furthermore, we are looking at triples of head word (or semantic class), syntactic case, and head word (or semantic class).</Paragraph> <Paragraph position="2"> In Hindle and Rooth's test, they evaluated their probability model in the limited case of verb - noun phrase - prepositional phrase. Therefore, no model at all would be at least 50% accurate.</Paragraph> <Paragraph position="3"> In our test, many of the test cases involved three or more possible attachment points fro the prepositional phrase, providing a more realistic test.</Paragraph> <Paragraph position="4"> An interesting next step would be to combine these two probabilistic models (perhaps via linear weights) in order to get the benefit of domain-specific knowledge, as we have explored, and the benefits of domain-independent knowledge, as Hindle and Rooth have explored.</Paragraph> <Paragraph position="5"> Conclusions Two traditional approaches to applying natural language processing techniques are complete syntactic analysis and script-based analysis. In Proteus (Grishman, 1989), complete syntactic analysis is applied. If no complete analysis of a sentence can be found, the largest S found anchored at the left end is analyzed, ignoring whatever occurs to the right. A second alternative is script-based analysis e.g., as represented in FRUMP (de Jong, 1979). This technique emphasizes semantic and domain expectations, minimizing dependence on syntactic analysis.</Paragraph> <Paragraph position="6"> In our approach to open-ended text processing, there are three steps: 1. Probabilistically based syntactic analysis produces a forest of non-overlapping fragments, if no single tree can be found.</Paragraph> <Paragraph position="7"> 2. A semantic interpreter assigns semantic representations to the trees of the forest.</Paragraph> <Paragraph position="8"> 3. Fragments are combined using a probability model reflexting both syntactic and semantic preferences.</Paragraph> <Paragraph position="9"> However, the most innovative aspect of our approach is the automatic induction of semantic knowledge from annotated examples. The use of probabilistic models offers the induction procedure a decision criterion for making generalizations from the corpus of examples.</Paragraph> <Paragraph position="10"> The partial parsing approach offers an alternative. By finding fragments based only on syntactic knowledge, and by starting a new fragment when a constituent cannot be detenninistically attached, one has some partial analysis of the whole input. How to compute semantic analysis for any constituent is well understood in any compositional semantics. An algorithm that combines the semantically interpreted fragments seems to gain the power of semantically guided analysis without sacrificing syntactic analysis. Fragments that cannot be combined can still be employed with discourse processing and script-based expectations to identify the entities and relations among them for data base update.</Paragraph> <Paragraph position="11"> Our pilot experiments indicate that the approach to text processing and the induction algorithm are both feasible and promising.</Paragraph> </Section> </Section> class="xml-element"></Paper>