File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1810_intro.xml
Size: 1,287 bytes
Last Modified: 2025-10-06 14:02:42
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1810"> <Title>Quantitative Portraits of Lexical Elements</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Texts and lexica </SectionTitle> <Paragraph position="0"> Automatic term weighting starts from texts/documents. To what spheres the weights are attributed can differ. Figure 1 shows the linguistic spheres of lexica and texts (Kageura, 2002); there are both concrete data spheres and abstract spheres on both the lexical and textual sides.</Paragraph> <Paragraph position="1"> Within this scheme, three types of relations between lexica and texts can be identified: concrete terms attributed to concrete texts, concrete terms corresponding to discourse, and abstract lexica corresponding to abstract discourse. We will show below that three major types of automatic term weighting methods correspond to these three types of relations between lexica and texts.</Paragraph> <Paragraph position="2"> text texttext text texttexttext A set of actual texts (targets of IR) Textual sphere / theoretical sphere of discourse term termterm termtermterm termterm Terms as attributes of concrete set of documents Lexicological sphere / theoretical sphere of lexica</Paragraph> </Section> class="xml-element"></Paper>