File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-3009_intro.xml
Size: 2,708 bytes
Last Modified: 2025-10-06 14:03:08
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-3009"> <Title>The Linguist's Search Engine: An Overview</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The idea for the Linguist's Search Engine originated in a simple frustration shared by many people who study language: the fact that so much of the argumentation in linguistic theory is based on subjective judgments. Who among us has not, in some talk or class, heard an argument based on a &quot;starred&quot; (deemed-ungrammatical) example, and whispered to someone nearby, Did that sound ok to you? because we thought it sounded fine? As Bard et al. (1996) put it, each linguistic judgment is a &quot;small and imperfect experiment'&quot;. Schutze (1996) and Cowart (1997) provide detailed discussion of instability and unreliability in such informal methods, which can lead to biased or even misleading results.</Paragraph> <Paragraph position="1"> Recent work on linguistics methodology draws on the perception literature in psychology to provide principled methods for eliciting gradient, rather than discrete, linguistic judgments (Sorace and Keller, 2005). In addition, at least as far back as Rich Pito's 1992 tgrep, distributed with the Penn Treebank, computationally sophisticated linguists have had the option of looking at naturally occurring data rather than relying on constructed sentences and introspective judgments (e.g., Christ, 1994; Corley et al., 2001; Blaheta, 2002; Kehoe and Renouf 2002; Konig and Lezius, 2002; Fletcher 2002; Kilgarriff 2003).</Paragraph> <Paragraph position="2"> Unfortunately, many linguists are unwilling to invest in psycholinguistic methods, or in the computational skills necessary for working with corpus search tools. A variety of people interested in language have moved in the direction of using Web search engines such as Google as a source of naturally occurring data, but conventional search engines do not provide the mechanisms needed to perform many of the simplest linguistically informed searches - e.g., seeking instances of a particular verb used only intransitively.</Paragraph> <Paragraph position="3"> The Linguist's Search Engine (LSE) was designed to provide the broadest possible range of users with an intuitive, linguistically sophisticated but user-friendly way to search the Web for naturally occurring data. Section 2 lays out the LSE's basic interface concepts via several illustrative examples. Section 3 discusses its architecture and implementation. Section 4 discusses the current status of the LSE and recent developments.</Paragraph> </Section> class="xml-element"></Paper>