File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0718_metho.xml
Size: 5,595 bytes
Last Modified: 2025-10-06 14:07:23
<?xml version="1.0" standalone="yes"?> <Paper uid="W00-0718"> <Title>ALLiS: a Symbolic Learning System for Natural Language Learning Herv@ D@jean Seminar ffir Sprachwissenschaft</Title> <Section position="5" start_page="96" end_page="97" type="metho"> <SectionTitle> 5 The Refinement </SectionTitle> <Paragraph position="0"> Once this initial grammar is built, we confront it to the bracketed corpus, and apply the refinement step. The general theory refinement algorithm given by (Abecker and Schmid, 1996) is: - Find revision points in theory - Create possible revisions - Choose best revision - Revise theory until no revision improves theory The next sections now show how these operations are performed by ALLiS.</Paragraph> <Section position="1" start_page="96" end_page="96" type="sub_section"> <SectionTitle> 5.1 Revision Points </SectionTitle> <Paragraph position="0"> Revision points correspond to errors generated by the initial grammar. In the example (2), the word operciting does not belong to the NP since the tag VBG is categorised as O(outside NP).</Paragraph> <Paragraph position="1"> This is thus a revision point. During the refinement, ALLiS finds out all the occurrences of a tag whose categorisation in the training corpus does not correspond to the categorisation provided by the initial grammar. Once revision points are identified, ALLiS disposes of two kinds of operators in order to fix errors: the specialisation and the generalisation. We just use basic implementation of these operators, but it is nevertheless enough to get efficient results comparable to other systems (Table 5).</Paragraph> </Section> <Section position="2" start_page="96" end_page="96" type="sub_section"> <SectionTitle> 5.2 The Specialisation </SectionTitle> <Paragraph position="0"> The specialisation relies on two operations: the contextualisation and the lexicalisation. The contextualisation consists of specifying contexts in which a rule categorises with a high accuracy an element. The table 1 provides examples of contexts for the tag VBG in which this tag occurs in an NP, and thus which fix the revision point of example (2). The lexicalisation consists of replacing 3 a tag by a specific word (Table 3). Some words in some contexts can have a behaviour which can not be detected at the tag level. If contextualisation is rather corpus independent, lexicalisation generates rules which depend of the type of the training corpus. More details about these two operations can be found in (D@jean, 2000b).</Paragraph> </Section> <Section position="3" start_page="96" end_page="96" type="sub_section"> <SectionTitle> 5.3 The Generalisation </SectionTitle> <Paragraph position="0"> After specialisation, some structures are still not recognised. If some revisions points can not be fixed using only local contexts, a generalisation (by relaxing constraints) in the definition of the structure can improve parsing. A structure is composed of a nucleus and optional adjuncts (Section 3). Such a structure can not recognise all the sequences categorised as NP in the training corpus. These unrecognised sequences are composed of elements without nucleus. In example (3), the sequence the reawakening composes a NP although it is tagged as AL AL by ALLiS.</Paragraph> <Paragraph position="1"> (3) \[the_DT reawakening_VBG\] of_IN \[the_DT abortion-rights_NNS movement_NN\] null Generalisation consists of accepting some sequences of elements which do no correspond to a whole structure (S --+ AL* N AR* \] AL+ 1 AR+). The technique we use for this generalisation is just the deletion of the element N in the rule describing a structure. More generally, this step allows the. correct parse of sequences where ellipsises occur. The most frequent partial structures correspond to the sequences: DT J J, DT VBG and DT.</Paragraph> </Section> <Section position="4" start_page="96" end_page="97" type="sub_section"> <SectionTitle> 5.4 The Selection of Rules </SectionTitle> <Paragraph position="0"> During the operations of specialisation and generalisation, rules are generated in order to improve the initial grammar. But combination aThe lexicalisation can be considered as a replacement of a variable by a constant.</Paragraph> <Paragraph position="1"> of both lexicalisation and contextualisation can yield rules which are redundant. In the table 4, the two last rules are learned whereas the first is enough.</Paragraph> <Paragraph position="2"> cision/recall).</Paragraph> <Paragraph position="3"> The purpose of its step is to reduce the number of rules ALLiS generated. In fact the number of rules can be reduced during the specialisation step. But a simplest way is to select some rules after specialisation and generalisation according to heuristics.</Paragraph> <Paragraph position="4"> The heuristic we used consists first of selecting the most frequent rules and then among them, those having the richest (longest) context (several rules can be obtained using only the criterion of frequency). In our case (learning linguistic structures), this heuristic provides good result, but a more efficient algorithm might may consist of parsing the corpus with the candidate rules and to select the most frequent rules providing the best parse.</Paragraph> <Paragraph position="5"> We can note that these superfluous rules do not generally produce wrong analyses, even if some are not linguistically motivated. The fact that we try to get the minimal revised theory is computationally interesting since the reduction of rules eases parsing.</Paragraph> </Section> </Section> class="xml-element"></Paper>