File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/94/p94-1032_concl.xml

Size: 4,710 bytes

Last Modified: 2025-10-06 13:57:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="P94-1032">
  <Title>Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation</Title>
  <Section position="8" start_page="238" end_page="239" type="concl">
    <SectionTitle>
7. Applications
</SectionTitle>
    <Paragraph position="0"> Identification of noun phrases in texts is useful for many applications. Anaphora resolution (Hirst, 1981) is to resolve the relationship of the noun phrases, namely, what the antecedent of a noun phrase is. The extracted noun phrases can form the set of possible candidates (or universal in the terminology of discourse representation theory). For acquisition of verb subcategorization frames, to bracket the noun phrases in the texts is indispensable.</Paragraph>
    <Paragraph position="1"> It can help us to find the boundary of the subject, the object and the prepositional phrase. We would use the acquired noun phrases for an application of adjective grouping. The extracted noun phrases may contain adjectives which pre-modify the head noun. We then utilize the similarity of head nouns to group the adjectives. In addition, we may give the head noun a semantic tag, such as Roget's Thesaurus provides, and then analyze the adjectives. To automatically produce the index of a book,  we would extract the noun phrases contained in the book, calculate the inverse document frequency (IDF) and their term frequency (TF) (Salton, 1991), and screen out the implausible terms.</Paragraph>
    <Paragraph position="2"> These applications also have impacts on identifying noun phrases. For applications like anaphora resolution and acquisition of verb subcategorization frames, the maximal noun phrases are not suitable. For applications like grouping adjectives and automatic book indexing, some kinds of maximal noun phrases, such as noun phrases postmodified by &amp;quot;of&amp;quot; prepositional phrases, are suitable: but some are not, e.g., noun phrases modified by relative clauses.</Paragraph>
    <Paragraph position="3"> 8. Concluding Remarks The difficulty of this work is how to extract the real maximal noun phrases. If we cannot decide the prepositional phrase &amp;quot;over a husband eyes&amp;quot; is licensed by the verb &amp;quot;pull&amp;quot;, we will not know &amp;quot;the wool&amp;quot; and &amp;quot;a husband eyes&amp;quot; are two noun phrases or form a noun pharse combined by the preposition &amp;quot;over&amp;quot;.</Paragraph>
    <Paragraph position="4"> (18) to pull the wool over a husband eyes to sell the books of my uncle In contrast, the noun phrase &amp;quot;the books of my uncle&amp;quot; is so called maximal noun phrase in current context. As the result, we conclude that if we do not resolve PP-attachment problem (Hindle and Rooth, 1993), to the expected extent, we will not extract the maximal noun phrases. In our work, the probabilistic chunker decides the implicit boundaries between words and the NP-TRACTOR connects the adjacent noun chunks. When a noun chunk is followed by a preposition chunk, we do not connect the two chunks except the preposition chunk is led by &amp;quot;of' preposition.</Paragraph>
    <Paragraph position="5"> Comparing with other works, our results are evaluated by a parsed corpus automatically and show the high precision. Although we do not point out the exact recall, we provide estimated values. The testing scale is large enough (about 150,000 words). In contrast, Church (1988) tests a text and extracts the simple noun phrases only. Bourigault's work (1992) is evaluated manually, and dose not report the precision. Hence, the real performance is not known. The work executed by Voutilainen (1993) is more complex than our work. The input text first is morphologizied, then parsed by constraint grammar, analyzed by two different noun phrases grammar and finally extracted by the occurrences. Like other works, Voutilainen's work is also evaluated manually.</Paragraph>
    <Paragraph position="6"> In this paper, we propose a language model to chunk texts. The simple but effective chunker could be seen as a linear structure parser, and could be applied to many applications. A method is presented to extract the noun phrases. Most importantly, the relations of maximal noun phrases, minimal noun phrases, ordinary noun phrases and applicable noun phrases are distinguished in this work. Their impacts on the subsequent applications are also addressed. In addition, automatic evaluation provides a fair basis and does not involve human costs.</Paragraph>
    <Paragraph position="7"> The experimental results show that this parser is a useful tool for further research on large volume of real texts.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML