File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1038_intro.xml
Size: 4,002 bytes
Last Modified: 2025-10-06 14:06:33
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1038"> <Title>PAT-Trees with the Deletion Function as the Learning Device for Linguistic Patterns</Title> <Section position="2" start_page="0" end_page="244" type="intro"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Human beings remember useful and important information and gradually forget old and unimportant information in order to accommodate new information. Under the constraint of memory capacity, it is important to have a learning mechanism that utilizes memory to store and to retrieve information efficiently and flexibly without loss of important information. We don't know how human memory functions exactly, but the issue of creating computers with similar competence is one of the most important problems being studied. We are especially interested in computer learning of linguistic patterns without the problem of running out of memory.</Paragraph> <Paragraph position="1"> To implement such a learning device, a data structure, equipped with the following functions, is needed: a) accept and store the on-line input of character/word patterns, b) efficiently access and retrieve stored patterns, c) accept unlimited amounts of data and at the same time retain the most important as well as the most recent input patterns. To meet the above needs, the PAT-tree data structure was originally considered a possible candidate to start with. The original design of the PAT-tree can be traced back to 1968. Morrison \[Morrison, 68\] proposed a data structure called the &quot;Practical Algorithm to Retrieve Information Coded in Alphanumeric&quot;(PATRICIA). It is a variation of the binary search tree with binary representation of keys. In 1987, Gonnet \[Gonnet, 87\] introduced semi-infinite strings and stored them into PATRICIA trees. A PATRICIA tree constructed over all the possible semi-infinite strings of a text is then called a PAT-tree. Many kinds of searching functions can be easily performed on a PAT-tree, such as prefix searching, range searching, longest repetition searching and so on. A modification of the PAT-tree was done to fit the needs of Chinese processing in 1996 by Hung \[Hung, 96\], in which the finite strings were used instead of semi-infinite strings. Since finite strings are not unique in a text as semi-infinite strings are, frequency counts are stored in tree nodes. In addition to its searching functions, the frequencies of any prefix sub-strings can be accessed very easily in the modified PAT-tree.</Paragraph> <Paragraph position="2"> Hence, statistical evaluations between sub-strings, such as probabilities, conditional probabilities, and mutual information, can be computed.</Paragraph> <Paragraph position="3"> It is easy to insert new elements into PATtrees, but memory constrains have made them unable to accept unlimited amounts of information, hence limiting their potential use as learning devices. In reality, only important or representative data should be retained. Old and unimportant data can be replaced by new data.</Paragraph> <Paragraph position="4"> Thus, aside from the original PAT-tree, the deletion mechanism was implemented, which allowed memory to be released for the purpose of storing the most recent inputs when the original memory was exhausted. With this mechanism, the PAT-tree is now enhanced and has the ability to accept unlimited amounts of information. Once evaluation functions for data importance are obtained, the PAT-tree will have the potential to be an on-line learning device. We review the original PAT-tree and its properties in section 2. In section 3,we describe the PAT-tree with deletion in detail. In section 4, we give the results obtained after different deletion criteria were tested to see how it performed on learning word bi-gram collocations under different sizes of memory. Some other possible applications and a simple conclusion are given in the last section.</Paragraph> </Section> class="xml-element"></Paper>