File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/92/c92-4200_concl.xml
Size: 2,205 bytes
Last Modified: 2025-10-06 13:56:51
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-4200"> <Title>KNOWLEDGE EXTRACTION FROM TEXTS BY SINTESI</Title> <Section position="6" start_page="0" end_page="0" type="concl"> <SectionTitle> 5. CONCLUSIONS </SectionTitle> <Paragraph position="0"> This paper presents a system to extract knowledge from domain-oriented ill-formed Italian inputs.</Paragraph> <Paragraph position="1"> The tmrpose of the paper was to demonstrate how it is able to guarantee efficiency, robustness and accuracy. SINTESI is currently used to extract knowledge fronl technical diagnostic texts on car faults. From a linguistic point of view it is able to extract knowledge from sentences involving the use of noun phrases, verb phrases and prepositional phrases; the sentences may contain conjunctions, a limited set of garden paths and some kinds of subordinates. The system has two ways to operate: an on-line mode in which each new text is analysed in real time and the extracted knowledge is approved or refused by the user; an oft-line mode to process the 40.000 texts that are already in a database. The extracted knowledge is used to generate search keys for the database, for statistical matters anti to build a knowledge base on faults. One of the goal of the system is the transportability through the applications in the same domain. Currently SINTESI has been tested on about 1000 technical texts; the rate of the correctly extracted int~ormation was of about 85%. Many problems came from not currently supported forms, unknown objects or words, anti complex garden paths. The system is able to process about 150 texts per hour running on a VAX 6510. It was developed by using the Nexpert Object tool and the C-language; it is now running in a DEC-VMS environment.</Paragraph> <Paragraph position="2"> In the near future we will extend SINTESI in order to cover most of the linguistic ff)rms that are still not covered. A method to extensively cope with the implicit knowledge is under development.</Paragraph> <Paragraph position="3"> Until now the system has been tested by few users, but it will be utilised by dozens of people with a rate of about 5.000 texts per year.</Paragraph> </Section> class="xml-element"></Paper>