XML Viewer - c96-1047

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/c96-1047_abstr.xml
Size: 30,296 bytes
Last Modified: 2025-10-06 13:48:33
<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1047">
  <Title>Machine-learned rules Recall Preckion</Title>
  <Section position="1" start_page="0" end_page="278" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> We present a novel approach to parsing phrase grammars based on Eric Brill's notion of rule sequences. The basic framework we describe has somewhat less power than a finite-state machine, and yet achieves high accuracy on standard phrase parsing tasks. The rule language is simple, which makes it easy to write rules. Further, this simplicity enables the automatic acquisition of phraseparsing rules through an error-reduction strategy. This paper explores an approach to syntactic analysis that is unconventional in several respects. To begin with, we are concerned not so much with the traditional goal of analyzing the comprehensive structure of complete sentences, as much as with assigning partial structure to parts of sentences. The fragment of interest here is demonstrably a subset of the regular sets, and while these languages are traditionally analyzed with finite-state automata, our approach relies instead on the rule sequence architecture defined by Eric Brill.</Paragraph>
    <Paragraph position="1"> Why restrict ourselves to the finite-state case? Some linguistic phenomena are easier to model with regular sets than context-free grammars. Proper names are a case in point, since their syntactic distribution partially overlaps that of noun phra~ses in general; as this overlap is only partial, name analysis within a full context-free grammar is cumbersome, and some approaches have taken to include finite-state name parsers as a front-end to a principal context-free parsing stage (Jacobs et al.</Paragraph>
    <Paragraph position="2"> I99i). Proper names are of further interest, since their identifi cation is independently motivated as valuable to both information retrieval and extraction (Sundheim ~996). Further, several promising recent approaches to information extraction rely on little more than finite-state machines to perform the entire extraction analysis (Appelt et al. I993 , Grishman I995).</Paragraph>
    <Paragraph position="3"> Why approach this problem with rule sequences? In this paper we maka the case that rule sequences succeed at this task through their simplicity and speed. Most important, they support mixed-mode acquisition: the rules are both easy for an engineer to write and easy to learn automatically.</Paragraph>
    <Paragraph position="4"> Rule sequences As part of our work in information extraction, we have been extensively exploring the use of rule sequences.</Paragraph>
    <Paragraph position="5"> Our information extraction prototype, Alembic, is in fact based on a pipeline of rule sequence processors that run the gamut from part-of-speech tagging, to phrase identification, to sentence parsing, to inference (Aberdeeen et al. I995). In each case, the underlying method is identical. Processing takes place by sequentially relabeling the corpus under consideration.</Paragraph>
    <Paragraph position="6"> Each sequential step is driven by a rule that attempts to patch residual errors left in place in the preceding steps. The patching process as a whole is itself preceded by an initial labeling phase that provides an approximate labeling as a starting point for rule application.</Paragraph>
    <Paragraph position="7"> This patching architecture, illustrated in Fig. 1, was codified by Eric Brill, who first exploited it for part-of-speech tagging (Brill I993). In the part-of-speech application, initial labeling is provided by lexicon lookup: lexemes are initially tagged with the most common part of speech assigned to them in a training corpus. This initial labeling is refined by two sets of transformations. Morphological transformations relabel the initial (default) tagging of those words that failed to be found in the lexicon. The morphological rules arc followed by contextual transformations: these rules inspect lexica\[ context to relabel lexemes that are ambiguous with respect to part-of-speech. In effect, the morphological transformations patch errors that were due to gaps in the lexicon, and the contextual rules patch errors that were due to the initial assignment of a lexeme's most common tag.</Paragraph>
    <Paragraph position="8"> Phrase identification: some examples Sequencing, patching, and simplicity, the hallmarks of Brill's part-of-speech tagger, are also characteristic of our phrase parser. In our approach, phrases are initially built around word sequences that meet certain lexical or part-of-speech criteria. The sequenced phrase-finding rules then grow the boundaries of phrases or set their label, according to a repertory of simple lexical and contextual tests. For example, the following rule assigns a label of oa(; to an unlabeled phrase just in case the phrase is ended by the word &amp;quot;Inc.&amp;quot;</Paragraph>
    <Paragraph position="10"> right-wd-1 lexeme &amp;quot;inc.&amp;quot; ; rightmost word in the ; phrase is &amp;quot;inc.&amp;quot; labebaction ORG) ; change the phrase's label, ; but not its boundaries Now, consider the following partially labelled string: &lt;none&gt;Donald F. DeScenza&lt;/none&gt;, analyst with  ) The SGML markup delimits phrases whose boundaries were identified by the initial phrase-finding pass. Of these phrases, the second successfully triggers the example rule, yielding the following relabeled string. &lt;none&gt;Donald F. PeScenza&lt;/none&gt;, analyst with &lt;org&gt;Nomura Securities Inc.&lt;/org&gt; The rule, which seems both as obvious as walking and as fool-proof comes from the name-findinig processor we developed for our participation in the 6 m Message Understanding Conference (MtJC-6). As it turns out, though, the rule is in fact not error-proof, and causes both errors of omission (i.e. recall errors) and commission (i.e. precision errors). Consider the case of &amp;quot;Volkswagen of America Inc.&amp;quot; Because the initial phrase labeling is only approximate, the string is broken into two sub-phr~es separated by &amp;quot;of&amp;quot;. &lt;none&gt;golkswagen&lt;/none&gt; of &lt;none&gt;America Inc,&lt;/none&gt; The example rule designates the partial phrase &amp;quot;America Inc.&amp;quot; as an out;, a precision error because of its partiality, ,and fails to produce an otto label spanning the entire string (a recall error).</Paragraph>
    <Paragraph position="11"> &lt;none&gt;golkswagen&lt;lnone&gt; of &lt;org&gt;America Inc.&lt;/org&gt; This problem is patched by a subsequent name-finding rule, namely the following.</Paragraph>
    <Paragraph position="12">  ; this is an organization ; is the leftmost lexeme ;in the phrase on a list ; of country words? ; to the left of the ; phrase is the word &amp;quot;og' ; tothe left of that is an ; unlabelled phrase ; merge the entire left ; contextinto the OIZG, ; phrase and all  The first two clauses of the rule are antecedents that look for phrases such as &amp;quot;America inc.&amp;quot; The next two clauses are further antecedents that look to the left of the phrase for contextual patterns of form &amp;quot;&lt;non~&gt;,. ,&lt;/none&gt; of&amp;quot;.</Paragraph>
    <Paragraph position="13"> The final two clauses incorporate the left context wholesale into the triggering phrase, yielding: &lt;org&gt;golkswagen of America Inc.&lt;/org&gt; This rule effectively patches tile errors caused by its predecessor in the rule sequence, and simultaneously eliminates both a recall and a precision error. The phrase finder With these examples as background, we may now turn our attention to the technical details of the phrase finding process. As noted above, this process occurs in two main steps, an initial labeling pass followed by the application of a rule sequence.</Paragraph>
    <Paragraph position="14"> Initial phrase labeling The initial labeling process seeds the phrase-finder with candidate phrases. These candidate phrases need not be any more than approximations, in partictdar, it is not necessary for these candidates to have wholly accurate boundaries, as their left and right edges can be adjusted later by means of patching rules. It is also not neccssatT for these candidates to be unfragmented, as fragments can be reassembled later, just as with &amp;quot;Volkswagen of America Inc.&amp;quot; Further, applications that require multiple types of phrase labels, need not choose such a label during the initial phrase-finding pass. What is important is that the initial phrase identification Fred the cores of phrases reliably, even if complete phrases arc not identified. That is, it must partially align some kind of candidate phrase ~ for every phrase (~ that is actually present in the input. Extending a concept from information retrieval, this amounts to maximizing what we might call initial recall, i.e., lit= I (1) I I / I (i) I, where (IJ is the set of actual phrases in a test set, K is the set of candidate phrases generated by the initial phrasing passs, and cI) I is tile set of those (D &lt; q~ that arc partially aligned with some 1( c K.</Paragraph>
    <Paragraph position="15"> The general strategy we have adopted for finding initial phrase seeds is to look for either runs of lcxcmes in a fixed word list or runs of lexemcs that have been tagged a certain way by our part-of-speech tagger. 1)iffercnt instantiations of this general strategy for initial phrase labeling naturally arise for different phrase-finding tasks. For example, on the classic &amp;quot;proper names&amp;quot; task in mixed-case text, we havc achieved good results starting from runs of lexemes tagged with Nm, or m'~ps, the Penn Treebank proper noun tags. This strategy achieves the desired high initial recall R I , as these tags are well-correlated with bona fide proper nanles ~md are reliably produced in mixed-case text by our part-of-speech tagger. This strategy does not yield quite as good initial precision (i.e., it yields false positives) for a number of rcasons, such as the fragmentation problcms noted above, e.g.,</Paragraph>
    <Paragraph position="17"> Once again, though, these initial precision errors arc readily addressed by patching rules.</Paragraph>
    <Paragraph position="18">  Test one place (resp. two places) to the left of the phrase Test one place (resp. two places) to the right of the phrase Test first (resp. second) word of phrase Test last (resp. next-to-last) word of phrase Test each word of phrase in succession. Succeeds if any word in the phrase passes the test.</Paragraph>
    <Paragraph position="19"> Test entire string spanned by phrase Test phrase's label Sets the label of the phrase Modify the phrase's !eft or right boundaries Table h Repertory of unary rule clauses.</Paragraph>
    <Paragraph position="20"> Phrase-finding rules A phrase-finding rule in our framework is made up of several clauses. The corc of the rule consists of clauses that test thc lexical context around a candidatc phrase 1&lt; or that test lcxcmcs spanned by 1(. The repertory of these test loci is given in &amp;quot;Fable 1. At any given locus, a test may either search for a particular lcxcmc, match a lexeme against a closed word list, match a part of speech, or match a phrase of a given type. Most rules also test the label of thc candidate phrase 1(.</Paragraph>
    <Paragraph position="21"> The unary contextual tests in the table may also bc combincd to form binary or ternary tests. For example, combining I,EVT-C'IXW-I and i~mrr-cwxa'-z clauses yields a rule that tests for the left bigram contcxt. This was done in the ore defragmentation rule described earlier. A rule also contains at least one action clause, either a clause that sets the label of the phrase, or one that modifies the boundaries of the phrase. Finally, some rule actions actually introduce new phrases that embed the candidate mad its test context; this allows one to build non-recursive parse trees.</Paragraph>
    <Paragraph position="22"> Phrase rule interpreter The phrase rule interpreter implements the rule language in a straightforward way. Given a document to be analyzed, it proceeds through a rule sequence one rule r at a time, and attempts to apply r to every phrase in every sentence in the document. The interpreter first attempts to match the test label of r to the label of the candidate phrase. If this test succeeds, then the interpreter attempts to satisfy the rule's contextual tests in the context of the candidate. If these test succeed, then the rule's bounds and label actions are executed. Beyond this, the only real complexity arises with phrase-finding tasks that require one to maintain a temporary lexicon. The clearest such example is proper name identification. Indeed, short name forms (e.g., &amp;quot;Detroit Diesel&amp;quot;) can sometimes only be identified correctly once their component terms have been found as part of the complete naxne (e.g., &amp;quot;Detroit Diesel Corp.&amp;quot;). The converse is also true, as short forms of person names (e.g., &amp;quot;Mr. Olatunji&amp;quot;) can help identify fitll nanm forms ( e.g., &amp;quot;Babatunde Olatunji&amp;quot;). The interprcter maintains a temporary lexicon on a document-by-document basis. Every time the interpreter changes the label of a phrase $, pairs of form &lt;Z, &amp;quot;c&gt; are added to the lexicon, where ~ is a lcxcmc in ~, and &amp;quot;c is the label with which (~ is tagged. This lexicon is then exploited to form the associations between short and long proper name forms (through an extension to the rule repertory defined above).</Paragraph>
    <Paragraph position="23"> Correspondence to the regular sets It is straightforward to prove that this approach recognizes a subset of the regular sets, so we will only sketch the outline of such a proof here. The proof proceeds inductivcly by constructing a finite state machinc bt that accepts exactly those strings which receive a certain label in the phrase-finding process under a given rule sequence Z. We consider each rule p in Z in order, and correspondingly elaborate the machine so as to reproduce the rule's effect.</Paragraph>
    <Paragraph position="24"> To begin with, consider that the initial phrase labeling proceeds by building phrases around lexemes 0~ 1 ..... fz n in a designated word list or by finding runs of certain parts of speech ~t 1 ..... 7Zm. The machine that reproduces this initial labeling is thus pl/rq ..... p n/n1 pl/nm ..... p n/nm Pl/nl ..... p n/rq As usual, node labeled &amp;quot;S&amp;quot; is thc start state, and any node drawn with two circles is ,an accepting state. The Pi/~i arc labels stand for all lcxemes in the lexicon that may be labeled with the part of speech gJ' The induction step in the construction procccds from ~l.bl , the machine built to reproduce Z up l~hrough rule l\] bl in the sequence, and adds additional states and arcs so as to reproduce Z up through ruh'. p i.</Paragraph>
    <Paragraph position="25"> For example, say Pi tests for the presence of a lexeme to the left of a phrase and e~tends the phrase's lxaundaries to include )v. We extend the machine bt to  encode this rule by replacing ~'s current start state S with a new one S', and adding a ~, transition from S' to the former start state S. Thus</Paragraph>
    <Paragraph position="27"> For a rule I~ that tests whether a phrase contains a certain lcxcme ~'i, wc construct an &amp;quot;acccptor&amp;quot; machinc that accepts any string with )~i in its midst. CoCO Noting that the regular sets are closed trader intersection, wc them proceed to build the machine that &amp;quot;intersects&amp;quot; the acccptor with bli.</Paragraph>
    <Paragraph position="28"> Other rule patterns arc handled with constructions of a similar flavor--space considerations preclude their description hcre. Note, howcw:r, that extending the fl:amework with a temporary lexicon makcs it transfinite-state, lqnally, as with all semi-parsers, the machines we construct in this way must actually be interpreted as transducers, not just acceptors.</Paragraph>
    <Paragraph position="29"> Learning rule sequences automatically Our experience with writing rule sequences by lt,-md in this approach has been very positive. &amp;quot;\['he rule patterns thcmselves are simple, and the fact that they arc sequenced localizes their effccts mid reduccs the scope of their interactions. These hand-engineering advantages are also conferred upon learning programs that attcmpt to acquire these rules atttomatica\[ly.</Paragraph>
    <Paragraph position="30"> The approach we have taken towards discovering phrase rule sequences automatically is a maximum error-reduction scheme for selecting the next rule in a sequence. This approach originated with Brill's work on part-of-speech tagging and bracketing (Brill i993).</Paragraph>
    <Paragraph position="31"> Brill's rule learning algorithm &amp;quot;\['he search for a rule sequence in a given training corpus begins hy first applying the initial labeling function, just as would be the case in running a complete sequence. Following this, the learning procedurc needs to consider every rule that can possibly apply at this juncture, which itself is a function of the rule schema laaaguage. For each such applicable rule *; the learner considers the possible improvement in phrase labeling conferred by r in the current state. The rule that most reduces the residual error in the training data is selected as the next rule in the sequence.</Paragraph>
    <Paragraph position="32"> This generate-and-test cycle is contimmd until a stopping criterion is reached, which is usually taken as the point where performance improvement falls below a threshold, or ceases altogether. Other a\[ternativcs include setting a strict limit on the number of rules learned, or cross-testing the performance improvement of a rule on a corpus distinct from the training set.</Paragraph>
    <Paragraph position="33"> The rule search space The language of phrase rules supports a large number of possible rules that the phrase rule learner might need to consider at any one time. Take one of our smallcr training sets, in which there arc ~9I sentences consisting of 6,8IZ word tokens, with z,o77 unique word types.</Paragraph>
    <Paragraph position="34"> (ionsidcring only lexical rules (those that look for particular words), this means that there are as many as I8,693 possibh', unary lexical rules (%077 x 9 rule schemata), mad IZ,941,787 binat T lexical rules (?.,o77 z x 3 simple bigram rule schemata) in the search space.</Paragraph>
    <Paragraph position="35"> However, by inverting the process, and tabulating only those lexical contexts that actually appear in the training texts, this search spacc is reduced to z,:.I 9 unal T lcxical rules and 854 binary lexical rules.</Paragraph>
    <Paragraph position="36"> There are two substantively different kinds of rules to acquire: rules that only change the label of a phrase, and those that change the boundary of a phrase. The latter prcsent a problem \[:or accurately estimating the improvement of a rule, since sometimes the boundary realignment necessary to fix a phrase problem exceeds the amount by which a single rule can move a boundary--namely, two lexemcs. For thcse phrascs to be fixed there will have to be more than one rule to nudge the appropriate phrase botmdaries over. We handle this through a heuristic scoring ftmction that estimates the wtluc of moving a boundary in such cases.</Paragraph>
    <Paragraph position="37"> Error estimation methods A rule that fixes a problem in some cases might well introduce errors in some other cases. This kind of over-generalization can occur early in the learning process, as new rules need only improve over an approximate initial labcting. The extent to which a candidate rule is rewarded for its specificity and penalized for its over-generalization can have a strong effect on the final performance of the rule sequences discovered.</Paragraph>
    <Paragraph position="38"> We explored the use of three different types of scoring metrics for use in selecting the &amp;quot;best&amp;quot; of the competing rules to add to the sequence. Initially we made use of a simple arithmetic difference metric, y- s, wimrc y (for yield) is the number of additional correct phrase labelings that would be introduced if a rule were to be added to the rule sequence, and s (for sacrifice) is the number of new mistaken labelings that would bc introduced by the addition of the rule. '\['his is Brill's original metric, but note that it does not differentiate between rules whose overall improvement is identical, but whose rate of over-generalization is not. For example, a rule whose yield is IOO and sacrifice is 7 deg is treated as equally valuable as one whose yield is only 3 deg but which introduces uo overgeneralization at all (sacrifice = o). This can lead to the selection of lowprecision rules, and while small numbers of precision errors may be patched, wholesale precision problems make subsequent improvement more difficult.</Paragraph>
    <Paragraph position="39">  (Training on i495 sentences from the MUc-6 named entities task). The next measure we investigated was one advocated by Dunning (I993) which uses a log likelihood measure for estimating the significance of rare events in small populations. This measure did not improve predsion or recall in the learned sequences.</Paragraph>
    <Paragraph position="40"> The third scoring measure we investigated was the F-measure (VanRijsbergen 1979), which was introduced in information retrieval to compute a weighted combination of recall and precision. The F-measure is also used extensively in evaluating information extraction systems at MUG (Chinchor I995). It is defined as:</Paragraph>
    <Paragraph position="42"> This measure is conservative in the sense that its value is closer to precision, p, or recall, R, depending on which is lower. By manipulating the ~ paraaneter one is able to control for the relative importance of recall or precision. Preliminary exploration shows that a ~ of 0.8 seems to boost precision with no significant loss in the long-term recall or F-measure of the rule sequences.</Paragraph>
    <Paragraph position="43"> Table z summariz~es the contributions of these three error measures towards learning rule sequences for the MUC-6 named entities task (for task details, see below).</Paragraph>
    <Section position="1" start_page="277" end_page="278" type="sub_section">
      <SectionTitle>
Evaluation
</SectionTitle>
      <Paragraph position="0"> We have applied this rule sequence approach to a variety of realistic tasks. These largely arose as part of our information extraction efforts, and have been either directly or indirecdy evaluated in the context of two evaluation conferences: MUC-6 and Mffl' (for Multi-lingual Entity Tagging). In this paper, we will primarily report on evaluation conducted in the context of the MuC-6 named entities task (Sundheim I995). 1 The named entities task attempts to measure the ability to identify the basic building blocks of most newswire analysis applications, e.g., named entities such as persons, organizations, and geographical locations.</Paragraph>
      <Paragraph position="1"> Also measured is the identification of some numeric expressions (money and percentiles), dates, and times.</Paragraph>
      <Paragraph position="2"> This task has become a classic application for finite-state pre-parsers, and indeed our work was in part motivated by the success that has been achieved by such systems in past information extraction evaluations.</Paragraph>
      <Paragraph position="3"> We have applied a variety of techniques towards this task. The easy cases of dates mid times are identified by a separate pre-processor, leaving numeric expressions 1We have also measured performance on several syntactic constructs, (e.g., the so-called noun group), and on semantic subgrammars, (e.o&lt;, person-title-organization appositions). (also easy) and &amp;quot;proper names&amp;quot; (the interesting hard part) to be treated by the rule sequence processor.</Paragraph>
      <Paragraph position="4"> Hand-crafted Rules We first approached this task as an engineering problem, and wrote a rule sequence by hand to identify these named entities. The rule sequence comprises I45 named-entity rules, Iz rules for expressions of money and percentiles, and 6I rules for geographical complements (as in &amp;quot;Hyundai of Canada&amp;quot;). In addition, the rules refer to a few morphological predicates and some short word lists--one such list, for example lists words designating business subsidiaries, e.g., &amp;quot;unit&amp;quot;. The initial phrase labeling for the proper name cases is implemented by accumulating runs of NNP- and NNeStagged lexemes. A similar strategy is used for number expressions, using numeric tags.</Paragraph>
      <Paragraph position="5"> The performance of our hand-crafted rule sequence is summarized in Table 3, below, which gives component scores on the Mt3c-6 blind test set. The most interesting measures are those for the difficult proper name cases. Our performance here is high, especially for person names. Our lowest score is on organizational names, but note that the system lacks any extensive organization name list. Aside from ten hard-wired names, all names are found from first principles. On the easy numeric expressions, performancc is ahnost perfect--precision appears poor for percentiles, but this is due to an artifact of the testing procedure. 2 Machine-crafted Rules To evaluate the performance of our learning algorithm, we attempted to reproduce substantially the same environment as is used for the hand-crafted rules. The learner had access to the same predefined word lists, including the less-than-perfect TU'S'tmR gazetteer.</Paragraph>
      <Paragraph position="6"> Further, we only acquired rules for the hardest cases, namely the person, organization, and location phrases.</Paragraph>
      <Paragraph position="7"> We cut offrule acquisition after the iooth rule.</Paragraph>
      <Paragraph position="8"> The results for this acquired rule set are surprisingly encouraging. As Table 3 shows, these rules achieved higher recall on the very hardest phrase type (organization) than their hand-crafted counterparts, albeit at a cost in precision. Overall, however, the machine-crafted rules still lag behind. When we incorporated them into our information extraction 2Our performance vis-a-vis other MUC-6 participants placed us in the top third of participating systems. Except for the absolute highest performer, all these top-tercile systems were statistically not distinguishable from each other.</Paragraph>
      <Paragraph position="9">  system, the machinc-learned rules achieved an overall named cntitics F-score of 85.2, compared to the 91.2 achieved by the hand-crafted rttlcs, it should be noted, however, that the system loaded with these machine-crafted rules still outpcrfimned about a third of systems participating in the MUc-6 evaluation.</Paragraph>
      <Paragraph position="10"> Multilingual evaluation (MH') After the Muc-6 evahtation, the namcd entity task was extended in various ways to make it more applicable cross-linguistically. Predictably, this was followed by a new round of evaluations: Mv:r. The target languages in tltis case were Spanish, Chinese, and Japanese. We applied our approach m all three.</Paragraph>
      <Paragraph position="11"> The Mt{'l' cvahtation rcquircd actual system performance resuhs to be kept strictly ,-monymotts, which precludes our reporting here any scores as specific as we have cited for English. What wc may legitimately report, however, is that wc have effectively reproduced or bettered our hand-engineered English results in the Spanish mid Japanese t~ks, despite having no native speakers of either language (and only the most rudimentary reading sldlls in Kanji). In both cases, we were d~le to exploit part-of-speech tagging and some existing word lists fbr person names and locations.</Paragraph>
      <Paragraph position="12"> For Chinese, although we had available a word segmentcr, we had neither part-o6speech tagger, nor word lists, nor even the elementary reading skills we had for Japanese. As a result, we had to rely ahnost entirely on the learning procedure to acquire any rule sequences. 1)cspitc thcse impediments, wc cmnc dose to reproducing our results with thc English machinclcarned named entidcs rule sequcncc.</Paragraph>
      <Paragraph position="13">  What is most encouraging about this approach is how well it performs on so many dimensions. We have only reported here on nature-finding tasks, but early invcstigations in other areas arc encouraging as well. With rule sequences that parse noun groups, for instance, we hope to reproduce the utility of other rulc-scqucnce approaches to text chunking (Ramshaw &amp; Marcus I995). We are also excited by the promise of the learning proccdure, not just because it learns good rules, but dso because the rules it learns can be freely intermixed with hand-cngineered rules. This mixed-mode acquisition is unique among natural language learning proccdurcs, mid we put it to good use in building our multilingual name-tagging sequences.</Paragraph>
      <Paragraph position="14"> l)espitc rcsuhs that comparc favorably to those of more mature systems, this work is still in its infancy.</Paragraph>
      <Paragraph position="15"> We still have much to explore, especially with the learning procedure, lndccd, while the lcamcr induces /'tile sequences that pcrfi~rm well in tim aggrcgatc, individual rules clearly show their mechanical genesis.</Paragraph>
      <Paragraph position="16"> For instm~cc, whcn the learner must break tics between identically-scoring rule candidates, it often does so in lhlguistically clumsy ways. At times, the learner may acquire a good contextual pattern, but may bc unable to extend it to closcly-related cases that would occur naturally m a linguist.</Paragraph>
      <Paragraph position="17"> We belicve thcsc problems arc solvable in the ncar~ term, and wc have partial solutions in place already. As our tcclmiques mature, this validates not only ottr particular approach Io phrase-finding, but the whole field of language processing through rule sequences.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML