File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1101_metho.xml

Size: 3,758 bytes

Last Modified: 2025-10-06 14:10:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1101">
  <Title>Linguistic Distances</Title>
  <Section position="5" start_page="2" end_page="2" type="metho">
    <SectionTitle>
3 Syntax
</SectionTitle>
    <Paragraph position="0"> Although there is less interest in similarity at the syntactic level among linguistic theorists, there is still one important areas of theoretical research in which it could play an important role and several interdisciplinary studies in which similarity and/or distant is absolutely crucial. Syntactic TYPOLOGY is an area of linguistic theory which seeks to identify syntactic features which tend to be associated with one another in all languages (Comrie, 1989; Croft, 2001). The fundamental vision is that some sorts of languages may be more similar to one another--typologically--than would first appear.</Paragraph>
    <Paragraph position="1"> Further, there are two interdisciplinary linguistic studies in which similarity and/or distance plays a great role, including similarity at the syntactic level (without, however, exclusively focusing on syntax). LANGUAGE CONTACT studies seek to identify the elements of one language which have been adopted in a second in a situation in which two or more languages are used in the same community (Thomason and Kaufmann, 1988; van Coetsem, 1988). Naturally, these may be non-syntactic, but syntactic CONTAMINATION is a central concept which is recognized in contaminated varieties which have become more similar to the languages which are the source of contamination. null Essentially the same phenomena is studied in SECOND-LANGUAGE LEARNING, in which syntactic patterns from a dominant, usually first, language are imposed on a second. Here the focus is on the psychology of the individual language user as opposed to the collective habits of the language community.</Paragraph>
    <Paragraph position="2"> Nerbonne and Wiersma (this volume) collect frequency distributions of part-of-speech (POS) trigrams and explore simple measures of distance between these. They approach issues of statistical significance using permutation tests, which requires attention to tricky issues of normalization between the frequency distributions.</Paragraph>
    <Paragraph position="3"> Homola &amp; KuboVn (this volume) join Nerbonne and Wiersma in advocating a surface-oriented measure of syntactic difference, but base their measure on dependency trees rather than POS tags, a more abstract level of analysis. From there they propose an analogue to edit distance to gauge the degree of difference. The difference between two tree is the sum of the costs of the tree-editing operations needed to obtain one tree from another (Noetzel and Selkow, 1999).</Paragraph>
    <Paragraph position="4"> Emms (this volume) concentrates on applications of the notion 'tree similarity' in particular in order to identify text which is syntactically similar to questions and which may therefore be expected to constitute an answer to the question. He is able to show that the tree-distance measure out-performs sequence distance measures, at least if lexical information is also emphasized.</Paragraph>
    <Paragraph position="5"> K&amp;quot;ubler (this volume) uses the similarity measure in memory-based learning to parse. This is a surprising approach, since memory-based techniques are normally used in classification tasks where the target is one of a small number of potential classifications. In parsing, the targets may be arbitrarily complex, so a key step is select an initial structure in a memory-based way, and then to adapt it further. In this paper K&amp;quot;ubler first applies chunking to the sentence to be parsed and selects an initial parse based on chunk similarity.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML