File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2602_intro.xml

Size: 4,650 bytes

Last Modified: 2025-10-06 14:04:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2602">
  <Title>Constraint Satisfaction Inference: Non-probabilistic Global Inference for Sequence Labelling</Title>
  <Section position="2" start_page="0" end_page="9" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In machine learning for natural language processing, many diverse tasks somehow involve processing of sequentially-structured data. For example, syntactic chunking, grapheme-to-phoneme conversion, and named-entity recognition are all usually reformulated as sequence labelling tasks: a task-specific global unit, such as a sentence or a word, is divided into atomic sub-parts, e.g. word or letters, each of which is separately assigned a label. The concatenation of those labels forms the eventual output for the global unit.</Paragraph>
    <Paragraph position="1"> More formally, we can define a sequence labelling task as a tuple (x,y,lscript). The goal is to map an input vector x = &lt;x1,x2,...,xn&gt; of tokens to an output sequence y = &lt;y1,y2,...,yn&gt; of labels.</Paragraph>
    <Paragraph position="2"> The possible labels for each token are specified by a finite set lscript, that is, yi [?] lscript,[?]i.</Paragraph>
    <Paragraph position="3"> In most real-world sequence labelling tasks, the values of the output labels are sequentially correlated. For machine learning approaches to sequence labelling this implies that classifying each token separately without considering the labels assigned to other tokens in the sequence may lead to sub-optimal performance. Ideally, the complex mapping of the entire input sequence to its corresponding output sequence is considered one classification case; the classifier then has access to all information stored in the sequence. In practise, however, both input and output sequences are far too sparse for such classifications to be performed reliably.</Paragraph>
    <Paragraph position="4"> A popular approach to circumvent the issues raised above is what we will refer to as the classification and inference approach, covering techniques such as hidden markov models and conditional random fields (Lafferty et al., 2001). Rather than having a token-level classifier make local decisions independently of the rest of the sequence, the approach introduces an inference procedure, operating on the level of the sequence, using class likelihoods estimated by the classifier to optimise the likelihood of the entire output sequence.</Paragraph>
    <Paragraph position="5"> A crucial property of most of the classification and inference techniques in use today is that the classifier used at the token level must be able to estimate the likelihood for each potential class label. This is in contrast with the more common view of a classifier having to predict just one class label for an instance which is deemed most optimal. Maximum-entropy models, whichareused in manyclassification and inference techniques, have this property; they model the conditional class distribution. In general, this is the case for all probabilistic classification methods. However, many general-purpose machine learning techniques are  not probabilistic. In order to design inference procedures for those techniques, other principles than probabilistic ones have to be used.</Paragraph>
    <Paragraph position="6"> In this paper, we propose a non-probabilistic inference procedure that improves performance of a memory-based learner on a wide range of natural-language sequence processing tasks. We start from a technique introduced recently by Van den Bosch and Daelemans (2005), and reinterpret it as an instance of the classification and inference approach. Moreover, the token-level inference procedure proposed in the original work is replaced by a new procedure based on principles of constraint satisfaction that does take into account the entire sequential context.</Paragraph>
    <Paragraph position="7"> The remainder of this paper is structured as follows. Section 2 introduces the theoretical background and starting point of the work presented in thispaper: thetrigram method, andmemory-based learning. Next, the new constraint-satisfaction-based inference procedure for class trigrams is presented in Section 3. Experimental comparisons of a non-sequence-aware baseline classifier, the original trigram method, andthenewclassification and inference approach on a number of sequence labelling tasks are presented in Section 4 and discussed in Section 5. Finally, our work is compared and contrasted with some related approaches in Section 6, and conclusions are drawn in Section 7.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML