File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1100_intro.xml
Size: 2,467 bytes
Last Modified: 2025-10-06 14:06:35
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1100"> <Title>Text Segmentation Using Reiteration and Collocation</Title> <Section position="3" start_page="614" end_page="614" type="intro"> <SectionTitle> 2 Identifying Lexical Cohesion </SectionTitle> <Paragraph position="0"> To automatically detect lexical cohesion ties between pairwise words, three linguistic features were considered: word repetition, collocation and relation weights. The first two methods represent lexical cohesion relations. Word repetition is a component of the lexical cohesion class of reiteration, and collocation is a lexical cohesion class in its entirety. The remaining types of lexical cohesion considered, include synonym and superordinate (the cohesive effect of general word was not included). These types can be identified using relation weights (Jobbins and Evett, 1998).</Paragraph> <Paragraph position="1"> Word repetition: Word repetition ties in lexical cohesion are identified by same word matches and matches on inflections derived from the same stem.</Paragraph> <Paragraph position="2"> An inflected word was reduced to its stem by look-up in a lexicon (Keenan and Evett, 1989) comprising inflection and stem word pair records (e.g. &quot;orange oranges&quot;).</Paragraph> <Paragraph position="3"> Collocation: Collocations were extracted from a seven million word sample of the Longman English Language Corpus using the association ratio (Church and Hanks, 1990) and outputted to a lexicon. Collocations were automatically located in a text by looking up pairwise words in this lexicon.</Paragraph> <Paragraph position="4"> Figure 1 shows the record for the headword orange followed by its collocates. For example, the pairwise words orange and peel form a collocation.</Paragraph> <Paragraph position="5"> I orange free green lemon peel red \] Relation Weights: Relation weights quantify the amount of semantic relation between words based on the lexical organisation of RT (Jobbins and Evett, 1995). A thesaurus is a collection of synonym groups, indicating that synonym relations are captured, and the hierarchical structure of RT implies that superordinate relations are also captured. An alphabetically-ordered index of RT was generated, referred to as the Thesaurus Lexicon (TLex). Relation weights for pairwise words are calculated based on the satisfaction of one or more of four possible connections in TLex.</Paragraph> </Section> class="xml-element"></Paper>