File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3234_intro.xml

Size: 7,125 bytes

Last Modified: 2025-10-06 14:02:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3234">
  <Title>Trained Named Entity Recognition Using Distributional Clusters</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 BWI
</SectionTitle>
    <Paragraph position="0"> BWI decomposes the problem of recognizing field instances into two Boolean classification problems: recognizing field-initial and field-terminal tokens.</Paragraph>
    <Paragraph position="1"> Given a target field, a separate classifier is learned for each of these problems, and the distribution of field lengths is modeled as a frequency histogram.</Paragraph>
    <Paragraph position="2"> At application time, tokens that test positive for initial are paired with those testing positive for terminal. If the length of a candidate instance, as defined by such a pair, is determined to have non-zero likelihood using the length histogram, a prediction is returned.</Paragraph>
    <Paragraph position="3"> Each of the three parts of a full prediction--initial boundary, terminal boundary, and length--is assigned a real-valued confidence. The confidence of a boundary detection is its strength as determined by AdaBoost, while that of the length assessment is the empirical length probability, which is determined using the length histogram. The confidence of the full prediction is the product of these three individual confidence scores. In the event that overlapping predictions are found in this way (a rare event, empirically), the predictions with lower confidence are discarded.</Paragraph>
    <Paragraph position="4"> In this section, we sketch those aspects of BWI relevant to the current application. More details are available in the paper in which BWI was defined (Freitag and Kushmerick, 2000).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 Boosting
</SectionTitle>
      <Paragraph position="0"> BWI uses generalized AdaBoost to produce each boundary classifier (Schapire and Singer, 1998).</Paragraph>
      <Paragraph position="1"> Boosting is a procedure for improving the performance of a &amp;quot;weak learner&amp;quot; by repeatedly applying it to a training set, at each step modifying example weights to emphasize those examples on which the learner has done poorly in previous steps. The output is a weighted collection of weak learner hypotheses. Classification involves having the individual hypotheses &amp;quot;vote,&amp;quot; with strengths proportional to their weights, and summing overlapping votes.</Paragraph>
      <Paragraph position="2"> Although this is the first application of BWI to NER, boosting has previously been shown to work well on this problem. Differing from BWI in the details of the application, two recent papers nevertheless demonstrate the effectiveness of the boosting  ments.</Paragraph>
      <Paragraph position="3"> paradigm for NER in several languages (Carreras et al., 2002; Wu et al., 2002), one of them achieving the best overall performance in a comparison of several systems (Sang, 2002).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Boundary Detectors
</SectionTitle>
      <Paragraph position="0"> The output of a single invocation of the weak learner in BWI is always an individual pattern, called a boundary detector. A detector has two parts, one to match the text leading up to a boundary, the other for trailing text. Each part is a list of zero or more elements. In order for a boundary to match a detector, the tokens preceding the boundary (or following it) must match the corresponding elements in sequence. For example, the detector [ms .][jones] matches boundaries preceded by the (case-normalized) two-token sequence &amp;quot;ms .&amp;quot; and followed by the single token &amp;quot;jones&amp;quot;. Detectors are grown iteratively, beginning with an empty detector and repeatedly adding the element that best increases the ability of the current detector to discriminate true boundaries from false ones, using a cost function sensitive to the example weighting. A look-ahead parameter allows this decision to be based on several additional context tokens. The process terminates when no extensions yield a higher score than the current detector.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Wildcards
</SectionTitle>
      <Paragraph position="0"> The elements of the detector [ms .][jones]are literal elements, which match tokens using case-normalized string comparison. More interesting elements can be introduced by defining token wildcards. Each wildcard defines some Boolean function over the space of tokens.</Paragraph>
      <Paragraph position="1"> Table 2 lists the baseline wildcards. Using wild-cards from this list, the example detector can be generalized to match a much broader range of boundaries (e.g., [ms &lt;Any&gt;][&lt;Cap&gt;]). By defining new wildcards, we can inject useful domain knowledge into the inference process, potentially improving the performance of the resulting extractor. For example, we might define a wildcard called &amp;quot;Honorific&amp;quot; that matches any of &amp;quot;ms&amp;quot;, &amp;quot;mr&amp;quot;, &amp;quot;mrs&amp;quot;, and &amp;quot;dr&amp;quot;.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Boundary Wildcards
</SectionTitle>
      <Paragraph position="0"> In the original formulation of BWI, boundaries are identified without reference to the location of the opposing boundary. However, we might expect that the end of a name, say, would be easier to identify if we know where it begins. We can build detectors that exploit this knowledge by introducing a special wildcard (called Begin) that matches the beginnings of names.</Paragraph>
      <Paragraph position="1"> In these experiments, therefore, we modify boundary detection in the following way. Instead of two detector lists, we learn four--the two lists as in the original formulation (call them a0a2a1a4a3a6a5a7 and a0a9a8a11a10a13a12a15a14 ), and two more lists (a0a16a1a4a3a6a5a7a18a17 and a0a9a8a11a10a13a12a15a14a19a17 ). In generating the latter two lists, we give the learner access to these special wildcards (e.g., the wildcard End in generating a0a20a1a4a3a21a5a7a18a17 ).</Paragraph>
      <Paragraph position="2"> At extraction time, a0a22a1a4a3a21a5a7 and a0a20a8a11a10a13a12a15a14 are first used to detect boundaries, as before. These detections are then used to determine which tokens match the &amp;quot;special&amp;quot; wildcards used by a0 a1a4a3a21a5a7a23a17 and a0 a8a11a10a13a12a15a14a19a17 . Then, instead of pairing a0 a1a4a3a6a5a7 predictions with those of a0a22a8a11a10a13a12a15a14 , they are paired with those made by a0a20a8a11a10a13a12a15a14a19a17 (and a0a20a1a4a3a21a5a7a23a17 with a0a20a8a11a10a13a12a24a14 ). In informal experiments, we found that this procedure tended to increase F1 performance by several points on a range of tasks. We adopt it uniformly in the experiments reported here.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML