File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/p98-1038_concl.xml
Size: 2,497 bytes
Last Modified: 2025-10-06 13:58:03
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1038"> <Title>PAT-Trees with the Deletion Function as the Learning Device for Linguistic Patterns</Title> <Section position="4" start_page="248" end_page="249" type="concl"> <SectionTitle> 5. Conclusion </SectionTitle> <Paragraph position="0"> The most appealing features of the PAT-tree with deletion are the efficient searching for patterns and its on-line learning property. It has the potential to be a good on-line training tool. Due to the fast growing WWW, the supply of electronic texts is almost unlimited and provides on-line training data for natural language processing. Following are a few possible applications of PAT-trees with deletion.</Paragraph> <Paragraph position="1"> a) Learning of high frequency patterns by inputting unlimited amounts of patterns. The patterns might be character/word n-grams or collocations. Thus, new words can be extracted. The language model of variable length n-grams can be trained.</Paragraph> <Paragraph position="2"> b) The most recently inserted patterns will be retained in the PAT-tree for a while as if it has a short term memory. Therefore, it can on-line adjust the language model to adapt to the current input text.</Paragraph> <Paragraph position="3"> c) Multiple PAT-trees can be applied to learn the characteristic patterns of different domains or different style texts. They can be utilized as signatures for auto-classification of texts.</Paragraph> <Paragraph position="4"> With the deletion mechanism, the memory limitation is reduced to some extent. The performance of the learning process also relies on the good evaluation criteria. Different applications require different evaluation criteria. Therefore, under the current PAT-tree system, the evaluation function is left open for user design.</Paragraph> <Paragraph position="5"> Suffix search can be done through construction of a PAT-tree containing reverse text. Wiidcard search can be done by traversing subtrees. When a wiidcard is encountered, an indefinite number of decision bits should be skipped.</Paragraph> <Paragraph position="6"> To cope with the memory limitation on the core memory, secondary memory might be required. In order to speed up memory accessing, a PAT-tree can be split into a PAT-forest. Each time, only the top-level sub-tree and a demanded lower level PAT-tree will resided in the core memory. The lower level PAT-tree will be swapped according to demand.</Paragraph> </Section> class="xml-element"></Paper>