File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-2004_intro.xml
Size: 2,821 bytes
Last Modified: 2025-10-06 14:02:23
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-2004"> <Title>Temporal Context: Applications and Implications for Computational Linguistics</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Tasks in computational linguistics (CL) normally focus on the content of a document while paying little attention to the context in which it was produced. The work described in this paper considers the importance of temporal context. We show that knowing one small piece of information-a document's publication date-can be beneficial for a variety of CL tasks, some familiar and some novel.</Paragraph> <Paragraph position="1"> The field of historical linguistics attempts to categorize changes at all levels of language use, typically relying on data that span centuries (Hock, 1991). The recent availability of very large textual corpora allows for the examination of changes that take place across shorter time periods. In particular, we focus on lexical change across decades in corpora of academic publications and show that the changes can be fairly dramatic during a relatively short period of time.</Paragraph> <Paragraph position="2"> As a preview, consider Table 1, which lists the top five unigrams that best distinguished the field of computational linguistics at different points in time, as derived from the ACL proceedings1 using the odds ratio measure (see Section 3). One can quickly glean that the field has become increasingly empirical through time.</Paragraph> <Paragraph position="3"> 1979-84 1985-90 1991-96 1997-02 system phrase discourse word natural plan tree corpus language structure algorithm training knowledge logical unification model database interpret plan data time periods, as measured by the odds ratio With respect to academic publications, the very nature of the enterprise forces the language used within a discipline to change. An author's word choice is shaped by the preceding literature, as she must say something novel while placing her contribution in the context of what has already been said. This begets neologisms, new word senses, and other types of changes.</Paragraph> <Paragraph position="4"> This paper is organized as follows: In Section 2, we introduce temporal term weighting, a technique that implicitly encodes time into keyword weights to enhance information retrieval. Section 3 describes the technique of temporal feature modification, which exploits temporal information to improve the text categorization task. Section 4 introduces several types of lexical changes and a potential application in document clustering.</Paragraph> <Paragraph position="5"> 1The details of each corpus used in this paper can be found in the appendix.</Paragraph> </Section> class="xml-element"></Paper>