File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/83/a83-1020_intro.xml
Size: 2,800 bytes
Last Modified: 2025-10-06 14:04:21
<?xml version="1.0" standalone="yes"?> <Paper uid="A83-1020"> <Title>AUTOMATIC ANALYSIS OF DESCRIPTIVE TEXTS</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> I INTRODUCTION </SectionTitle> <Paragraph position="0"> A lot of useful information, covering many subject areas, is presently available in printed form in catalogues, directories and guides. Good&quot; examples are plants in &quot;Collins Pocket Guide to Wild Flowers&quot;, aeroplanes in &quot;Jane's All the World's Aircraft&quot; and people in '~ho's Who&quot;. Because chls informaClon is represented in a stylised form, it is amenable CO machine processing Co abstract salient details concerning the entity being described. The research described here is part of a long term project to develop a system which can &quot;read&quot; descriptive text and so become an expert on the -~terial which has been read.</Paragraph> <Paragraph position="1"> The first stage of this research is to establish that it is indeed possible co abstract useful information from descriptive text and we have chosen as a typical example a text consisting of descriptions of wild plants. Our system reade this text and generates a formal canonical plant description. Ultimately this will be input to a knowledse-baeed system which will then be able to answer questions on wild plants.</Paragraph> <Paragraph position="2"> The paper gives a limited overview of the recent work in text analysis in order to establish a context for the approach we adopt. An outline of the operation of the system is then nadeo The analysis of our text proceeds in four separate stages and these are considered in con-Junction with a sample text. The first stage at-Caches to each word in the text attributes which are held in either a keyword llst or the system dictionary. This expanded text is then split up using conjunctions, punctuation marks and the keywords in the text to assign each segment of the text to a particular part of the plant. The chard stage gathers up the descriptions for a particular part and abstracts properties from them. The final operation formats the output as required.</Paragraph> <Paragraph position="3"> We then look at the more detailed operation of the system in terms of specific parts of PSnteresto This covers the dictionary, skeleton structures, text splitting, text analysis and the limited word guessing attempted by the system.</Paragraph> <Paragraph position="4"> Future developments are then considered. In particular the possibility of generalising the system to handle ocher topics. The actual implementation of the system and the use of FROLOG are examined and we conclude with some notes on the current ucillty of our system.</Paragraph> </Section> class="xml-element"></Paper>