Candidate Terms Extracted Using a set of Part-of-Speech patterns
Extracted candidate terms using a set of part-of-speech patterns; below are files to download.
Index of: CANDIDATE_TERM/ |
|||
| Size: | Name: | Description: | |
| 12.799.467 | _ALL_CANDID_TERM_ BY_POS.ZIP | This file contains all the extracted candidate terms using the devised part-of-speech patterns. Each line of the file represent the following information:
| |
| 12.132.390 | _ALL_CANDID_TERM_ BY_POS_ DOCUMENT_INDEX.ZIP | An inverted index file that maps terms to documents in the corpus. Each line of the file shows a single occurrence of a term in the form of TERM_ID followed by DOCUMENT_ID (tab separated). Please note DOCUMENT_ID corresponds to an integer id that is assigned to each document in the SEPID_CORPUS. | |
| 15.097.774 | _ALL_CANDID_TERM_ BY_POS_ SECTION_INDEX.ZIP | Similar as above, however, for sections: an inverted index file that maps terms to sections in the corpus. Each line of the file shows a single occurrence of a term in the form of TERM_ID followed by SECTION_ID (tab separated). | |
| 19.778.365 | _ALL_CANDID_TERM_ BY_POS_ PARAGRAPH_INDEX.ZIP | Similar as above, however, for paragraphs, i.e. TERM_ID followed by PARAGRAPH_ID (tab separated from SEPID_CORPUS). | |
| 48.039.732 | _ALL_CANDID_TERM_ BY_POS_ SENTENCE_INDEX.ZIP | Similar as above however for sentences. The format of the file is TERM_ID followed by SENTENCE_ID followed by START and END positions of the term. START and END are the token numbers in the sentence. | |
| 365 | POS_SEQUENCE_ FILTER | The employed part-of-speech tag sequence patterns for the extraction of candidate terms. | |
| <DIR> | CANDID_TERM_ BY_POS_ SENTENCE_INDEX/ | In this folder, the (candidate-term-id, sentence-id) indices (i.e. in _all_candid_term_by_pos_sentence_index.zip) are grouped by the date(year) of publication of source documents. The first two letters of filenames show the year of publication. For instance, the file "84_candid_term_by_pos_pattern_sentence_index.zip" contains all sentence--term-id mapping from the corpus in which the sentences are from the publications in the year 84. These files together with the additional provided index files in SEPID_CORPUS can be used to organize candidate terms in a chronological order. There are currently 34 files, representing publications from 67 (i.e. 1967) to 06 (i.e. 2006). | |
Directory contains 124.293.185 Bytes in 7 Files | |||
Index of: CANDID_TERM_BY_POS_PATTERN_SENTENCE_INDEX/ |
<Up to the higher level directory> | ||
To download all these files in one zip file click here. | |||
| Size: | Name: | Description: | |
| 2.720.866 | 00_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 2000. | |
| 1.413.694 | 01_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 2001. | |
| 2.062.648 | 02_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 2002. | |
| 2.396.856 | 03_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 2003. | |
| 4.404.305 | 04_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 2004. | |
| 2.505.827 | 05_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 2005. | |
| 4.884.960 | 06_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 2006. | |
| 108.336 | 65_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1965. | |
| 105.363 | 67_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1967. | |
| 217.444 | 69_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1969. | |
| 157.564 | 73_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1973. | |
| 146.083 | 75_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1975. | |
| 193.892 | 78_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1978. | |
| 1.060.563 | 79_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1979. | |
| 592.510 | 80_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1980. | |
| 243.011 | 81_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1981. | |
| 532.555 | 82_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1982. | |
| 532.554 | 83_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1983. | |
| 534.551 | 84_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1984. | |
| 585.908 | 85_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1985. | |
| 1.072.015 | 86_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1986. | |
| 656.015 | 87_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1987. | |
| 1.415.295 | 88_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1988. | |
| 991.309 | 89_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1989. | |
| 1.555.268 | 90_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1990. | |
| 1.378.971 | 91_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1991. | |
| 2.248.958 | 92_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1992. | |
| 1.738.044 | 93_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1993. | |
| 2.510.786 | 94_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1994. | |
| 964.317 | 95_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1995. | |
| 2.095.609 | 96_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1996. | |
| 2.034.305 | 97_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1997. | |
| 2.642.241 | 98_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1998. | |
| 1.481.054 | 99_CANDID_TERM_BY_POS_PATTERN_ SENTENCE_INDEX.ZIP | Term-Sentence indices from articles published in year 1999. | |
Directory contains 48.183.677 Bytes in 34 Files | |||
Total: 172.476.862 Bytes in 41 Files | |||
This page last edited on 06 October 2025.
