File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/95/w95-0112_concl.xml
Size: 2,359 bytes
Last Modified: 2025-10-06 13:57:27
<?xml version="1.0" standalone="yes"?> <Paper uid="W95-0112"> <Title>Automatically Acquiring Conceptual Patterns Without an Annotated Corpus</Title> <Section position="6" start_page="159" end_page="160" type="concl"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> AutoSlog-TS demonstrates that conceptual patterns for information extraction can be acquired automatically from only a preclassified text corpus, thereby obviating the need for an annotated training corpus. Generating annotated corpora is time-consuming and sometimes difficult, though the payoffs are often significant. General purpose text annotations, such as part-of-speech tags and noun-phrase bracketing, are costly to obtain but have wide applicability and have been used successfully to develop statistical NLP systems (e.g., \[Church, 1989; Weischedel et al., 1993\]).</Paragraph> <Paragraph position="1"> Domain-specific text annotations, however, require a domain expert and have much narrower applicability.</Paragraph> <Paragraph position="2"> From a practical perspective, it is important to consider the human factor and to try to minimize the amount of time and effort required to build a training corpus. Domain-specific text annotations are expensive to obtain, so our goal has been to eliminate our dependence on them.</Paragraph> <Paragraph position="3"> 15As we stated in Section 3.1, it took a person only 5 hours to review the 1237 concept nodes produced by AutoSlog \[Riloff, 1993\].</Paragraph> <Paragraph position="4"> 16The connected words represent phrases in CIRCUS' lexicon.</Paragraph> <Paragraph position="5"> We have shown that a more coarse level of manual effort is sufficient for certain tasks. We have shown how a preclassified training corpus can be combined with statistical techniques to create conceptual patterns automatically. We believe that it is much easier for a person to separate a set of texts into two piles (the relevant texts and the irrelevant texts) than to generate detailed text annotations for a domain. Furthermore, the classifications are general in nature so various types of systems can make use of them. AutoSlog-TS suggests promising directions for future research in developing dictionaries automatically using only preclassified corpora without detailed text annotations.</Paragraph> </Section> class="xml-element"></Paper>