File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-1804_concl.xml
Size: 1,082 bytes
Last Modified: 2025-10-06 13:53:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1804"> <Title>Using Masks, Suffix Array-based Data Structures and Multidimensional Arrays to Compute Positional Ngram Statistics from Corpora</Title> <Section position="9" start_page="18" end_page="18" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> In this paper, we have described an implementation to compute positional ngram statistics based on masks, suffix array-based data structure and multidimensional arrays. Our C++ solution shows that it takes 8.59 minutes to compute both frequency and Mutual Expectation for a 1.092.723-word corpus on an Intel Pentium III 900 MHz for a seven-word size window context. In fact, our architecture evidences O(h(F) N log N) time complexity. To some extent, this work proposes a response to the conclusion of (Kit and Wilks, 1998) that claims that &quot;[...] a utility for extracting discontinuous co-occurrences of corpus tokens, of any distance from each other, can be implemented based on this program [The</Paragraph> </Section> class="xml-element"></Paper>