XML Viewer - w00-0718

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0718_intro.xml
Size: 4,664 bytes
Last Modified: 2025-10-06 14:00:58
<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0718">
  <Title>ALLiS: a Symbolic Learning System for Natural Language Learning Herv@ D@jean Seminar ffir Sprachwissenschaft</Title>
  <Section position="4" start_page="0" end_page="96" type="intro">
    <SectionTitle>
3 ALLiS
ALLiS (Architecture for Learning Linguistic
</SectionTitle>
    <Paragraph position="0"> Structures) (D~jean, 2000a) is a symbolic machine learning system which generates categorisation rules from a tagged and bracketed corpus. These categorisation rules allow (partial) parsing. Unless (Brill, 1993), these rules cannot be directly used in order to parse a text.</Paragraph>
    <Paragraph position="1"> ALLiS uses an internal formalism in order to represent the grammar rules it has learned.</Paragraph>
    <Paragraph position="2"> This internal representation (Table 1) allows  the use of different systems in order to parse the structures. Each system requires a conversion of theses rules into its formalism. This use of &amp;quot;intermediary&amp;quot; formalism allows the separation of two different problems: the generation of (linguistic) rules and the use of them.</Paragraph>
    <Section position="1" start_page="95" end_page="96" type="sub_section">
      <SectionTitle>
Unless Transformation-Based Learning (Brill,
</SectionTitle>
      <Paragraph position="0"> 1993) which modifies training data each time a rule is learned, ALLiS always uses the original training data. By this way you try to separate the problem of learning &amp;quot;linguistic&amp;quot; rules to the problem of parsing (the adequate use of these rules). The rules generated contains enough information (elements which compose the contexts, structures of these elements) so that we can correctly generate rules for a specific parser.</Paragraph>
      <Paragraph position="1"> We can note that, although rules have to be ordered during the parse, this order does not depend on the order used during the learning step, but depends on the category of the element.</Paragraph>
      <Paragraph position="2">  tion of the category AL (NP).</Paragraph>
      <Paragraph position="3"> Table 1 shows a part of the file generated concerning the categorisation of the tag VBG. The first line has to be read: when the tag VBG occurs after the tag PRP$ (left context) and when the tag PRP$ occurs in the structure (L=l(in)), the tag VBG is categorised as AL (left adjunct: see next section). In order to parse a text, a module automatically converts this formalism into appropriate formalisms which can be used by existing symbolic parsers. Several tools have been tried: the CASS parser (Abney, 1996), XFST (Karttunen et al., 1997)) and LT TTT (Glover et al., 1999). The TTT formalism seems to be the most appropriate (rules are easy to generate and the resulting parser is fast). The TTT rule corresponding to the first line of the  table 1 is given table 2 &lt;RULE name=&amp;quot;AL&amp;quot; targ_sg=&amp;quot;(c) \[CAT=' AL' \] &amp;quot;&gt; &lt;REL match=&amp;quot;W \[C= ' PRP$ ' m_mod= ' TEST '  The first step is to assign to each tag of the corpus a default category corresponding to its most frequent behaviour regarding the structure we want to learn. The result of its operation is a set of rules which assign a default category to each tag.</Paragraph>
      <Paragraph position="4"> In general, the baseline is computed by giving an element its most frequent tag. ALLiS uses an initial grammar which is a little more sophisticated: it uses the same principle with the exception that the default tag depends on contexts. Generally the chunk tagset is composed of three tags: B,I, and O. ALLiS uses a subcategorisation of the I category. It considers that a structure is composed of a Nucleus (tag N) with optional left and right adjuncts (AL and AR). These three classes (AL, N, AR) possess an attribute B 1 with the value +/-. Furthermore, an element is considered as AL/AR iff it occurs before/after a nucleus. For this reason, a tag such as j j2 can be categorised as AL or O(outside) according to its context. Precision and recall of this initial grammar axe around 86%. An example of NP analyse provided by the initial grammar is:</Paragraph>
      <Paragraph position="6"> The initial grammar categorises the tag VBG as occurring by default outside an NP, which is mainly the case (as in example (1)). But in  some cases this default categorisation is wrong (example 2). Since the default structure is defined as: S ~ \[AL* N AR*\]+, the phrase the operating chief can not be correctly parsed by the initial grammar. Such an error can be fixed during the refinement step as explained in the next section.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML