File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0741_metho.xml

Size: 7,865 bytes

Last Modified: 2025-10-06 14:07:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0741">
  <Title>Learning from Parsed Sentences with INTHELEX</Title>
  <Section position="4" start_page="194" end_page="194" type="metho">
    <SectionTitle>
2 A Stratified Parser for Italian
Language
</SectionTitle>
    <Paragraph position="0"> This section presents a parser for the Italian language, based on context-free grammars and designed to manage texts having a simple and standard phrase structure (e.g., foreign commerce texts as opposed to poetry texts). It is composed by 12 parsing levels and 106 production rules, and uses the longest-match technique, which complies with the typical ambiguity of Italian language. Syntactic lookahead is used to overcome ambiguity and to prevent the parsing from stopping in case of grammatically wrong input.</Paragraph>
    <Paragraph position="1"> The text is segmented in progressively larger syntactic constructs. Subject, main verb, direct or indirect object and clauses referring to them are identified. Nested syntactic constructs at the same abstraction level (e.g., expressions including a sentence in parentheses) are supported. null Plain text documents are provided to a lexical analyzer and a noun-recognizer (XEROX MUL-TEXT), whose output is the document text tagged with parts of speech to be fed to the parser. Since Italian grammar is very different from the English one, some terms do not have an English equivalent and, hence, cannot be translated.</Paragraph>
    <Paragraph position="2"> The parser was validated on a set of 72 sentences drawn from a corpus of articles on foreign commerce available on the Internet, and the results obtained were evaluated with</Paragraph>
    <Paragraph position="4"> respect to precision, recall (reported in Table 1) and two measures about error ratio:</Paragraph>
    <Paragraph position="6"> (see Table 2).</Paragraph>
  </Section>
  <Section position="5" start_page="194" end_page="197" type="metho">
    <SectionTitle>
3 Information extraction
</SectionTitle>
    <Paragraph position="0"> The grammar above was used to parse Italian texts downloaded from the Internet, and concerning foreign commerce. Through such preprocessing, the aim was to obtain some structure for those texts that could then be translated in the input language of the learning system INTHELEX (Esposito et al., 2000) in order to make it learn simple events concerning that domain.</Paragraph>
    <Paragraph position="1"> INTHELEX (INcremental THEory Learner from EXamples) is a fully incremental, multi-conceptual closed loop learning system for the induction of hierarchical theories from examples. In detail, full incrementality avoids the need of a previously generated version of the theory to be available, so that learning can start from an empty theory and from the first exam- null ple; multi-conceptual means that it ,:an learn simultaneously various concepts, possibly related to each other; a closed loop system is a system in which the learned theory is checked to be valid on any new example available, and in case of failure a revision process is activated on it, in order to restore the completeness and consistency properties.</Paragraph>
    <Paragraph position="2"> Incremental learning is necessary when either incomplete information is available at the time of initial theory generation, or the nature of the concepts evolves dynamically. The latter situation is the most difficult to handle since time evolution needs to be considered. In any case, it is useful to consider learning as a closed loop process, where feedback on performance is used to activate the theory revision phase.</Paragraph>
    <Paragraph position="3"> INTHELEX learns theories, from positive and negative examples described in the same language. It adopts a full memory storage strategy -- i.e., it retains all the available examples, thus the learned theories are guaranteed to be valid on the whole set of known examples.</Paragraph>
    <Paragraph position="4"> In the formal representation of texts, we used the following descriptors:</Paragraph>
    <Paragraph position="6"> where lemma is a meta-predicate. This allows the system to exploit information about word lemmas in generalizations/specializations, and in the recognition of higher level concepts of which lemma is an instance.</Paragraph>
    <Paragraph position="7"> Thus, the following Horn clause is an instance of an example:</Paragraph>
    <Paragraph position="9"> ombrello (eg).</Paragraph>
    <Paragraph position="10"> A first experiment aimed at learning the concept of specialization (of someone in some field). The system was run on 40 examples, 24 positive and 16 negative. The resulting theory was made up by 5 clauses, some of which differ just in one literal (e.g., the lemma of the word in the subject). By exploiting the background knowledge that terms 'impresa', 'societY', 'ditta' and 'agenzia' are all instances of the concept 'persona giuridica', i.e. clauses:</Paragraph>
    <Paragraph position="12"> the theory becomes more compact, yielding the following rules:  specialization(A)~sent(A,B), null subj(B,C), np(C,D), persona_giuridica(D), verb(B,E), specializzare(E), finite(E), affirmative(E), indirect_obj(B,F), pp(F,_).</Paragraph>
    <Paragraph position="13">  specialization(A)~sent(A,B), null subj(B,C), np(C,D), persona_giuridica(D), rel_subj(B,E), verb(E,F), specializzare(F), affirmative(F), pp(E,_), verb(B,_).</Paragraph>
    <Paragraph position="14"> Another experiment aimed at learning the concept of &amp;quot;imports&amp;quot;. INTHELEX was run starting from the empty theory, and was fed with a total of 67 examples (39 positive and 28 negative). It should be noted that not all positive examples explicitly use verb 'importare' (to import): e.g., in the sentence &amp;quot;Societ&amp; belga, specializzata nella lavorazione del legno, cerca fornitori di legname&amp;quot; the imports event is characterized by the noun 'societ&amp;' (society) as the sentence subject, by the verb 'cercare' (to look for) and by the object including the noun 'fornitore' (provider). We obtained the following results (in which the above background knowledge was used to compress more rules into one, too): imports(A) ~- sent(A,B), subj(B,C), np(C,D), persona_giuridica(D), verb(B,E), cercare(E), finite(E), affirmative(E), obj(B,F), np(F,G), fornitore(G).</Paragraph>
    <Paragraph position="15"> imports(A) ~-sent(A,B), subj(B,C), np(C,D), persona_giuridica(D), societa(D), verb(B,E), cercare(E), finite(E), affirmative(E), obj(B,F), np(F,G), distributore(G).</Paragraph>
    <Paragraph position="16"> imports(A) ~-sent(A,B), subj(B,C), np(C,D), persona_giuridica(D), verb(B,E), interessare(E), finite(E), affirmative(E), indirect_obj(B,F), pp(F,G), importazione(G).</Paragraph>
    <Paragraph position="17"> imports(A) ~-sent(A,B), subj(B,C), np(C,D), persona_giuridica(D), verb(B,E), acquistare(E), finite(E), affirmative(E).</Paragraph>
    <Paragraph position="18"> imports(A) ~-sent(A,B), subj(B,C), np(C,D), persona_giuridica(D), impresa(D), verb(B,E), importare(E), finite(E), affirmative(E).</Paragraph>
    <Paragraph position="19"> For instance, the third clause means: &amp;quot;Text A deals with imports if it contains a sentence with a subject composed by a NP containing a persona giuridica, the verb of the main sentence is interessare (to interest) in finite affirmative mood, and the indirect object consists of a PP containing the word importazione&amp;quot;. Note that, by exploiting a background knowledge that represents a more complex ontology than the current one, it would be possible to further merge conceptual descriptors and, as a consequence, clauses in the theory. For example, 'fornitore' (provider) and 'distributore' (distributor) could be recognized as instances of a common higher level concept; the same applies to 'acquistaxe' (to buy) and 'importare' (to import).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML