File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-2001_evalu.xml

Size: 11,341 bytes

Last Modified: 2025-10-06 13:59:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2001">
  <Title>Multilingual Extension of a Temporal Expression Normalizer using Annotated Corpora</Title>
  <Section position="6" start_page="4" end_page="7" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> The automatic extension of the system to Italian (Ita-TERSEO) has been evaluated using I-CAB, which has been divided in two parts: training and test. The training part has been used, first of all, in order to automatically extend the system. After this extension, the system was evaluated both against the training and the test corpora. The purpose of this double evaluation experiment was to compare the recall obtained over the training corpus with the value obtained over the test corpus.</Paragraph>
    <Paragraph position="1"> An additional evaluation experiment has also been carried out in order to compare the performance of the automatically developed system with a state of the art system specifically developed for Italian and English, i.e. the Chronos system described in (Negri and Marseglia, 2004).</Paragraph>
    <Paragraph position="2"> In the following sections, more details about I-CAB and the evaluation process are presented, together with the evaluation results.</Paragraph>
    <Section position="1" start_page="4" end_page="5" type="sub_section">
      <SectionTitle>
4.1 The I-CAB Corpus
</SectionTitle>
      <Paragraph position="0"> The evaluation has been performed on the temporal annotations of I-CAB (I-CAB-temp) created as part of the three-year project ONTOTEXT7 funded by the Provincia Autonoma di Trento.</Paragraph>
      <Paragraph position="1"> I-CAB consists of 525 news documents taken from the local newspaper L'Adige (http://www.adige.it). The selected news stories belong to four different days (September, 7th and 8th 2004 and October, 7th and 8th 2004) and are grouped into five categories: News Stories, Cultural News, Economic News, Sports News and Local News. The corpus consists of around 182,500 words (on average 347 words per file).</Paragraph>
      <Paragraph position="2"> The total number of annotated temporal expressions is 4,553; the average length of a temporal expression is 1.9 words.</Paragraph>
      <Paragraph position="3"> The annotation of I-CAB has been carried out adopting the standards developed within the ACE  tasks, which allows for a semantically rich and normalized annotation of different types of temporal expressions (for further details on the TIMEX2 annotation standard for English see (Ferro et al., 2005)).</Paragraph>
      <Paragraph position="4"> The ACE guidelines have been adapted to the specific morpho-syntactic features of Italian, which has a far richer morphology than English. In particular, some changes concerning the extension of the temporal expressions have been introduced. According to the English guidelines, in fact, definite and indefinite articles are considered as part of the textual realization of an entity, while prepositions are not. As the annotation is word-based, this does not account for Italian articulated prepositions, where a definite article and a preposition are merged. Within I-CAB, this type of preposition has been included as possible constituents of an entity, so as to consistently include all the articles.</Paragraph>
      <Paragraph position="5"> An assessment of the inter-annotator agreement based on the Dice coefficient has shown that the task is a well-defined one, as the agreement is 95.5% for the recognition of temporal expressions.</Paragraph>
    </Section>
    <Section position="2" start_page="5" end_page="7" type="sub_section">
      <SectionTitle>
4.2 Evaluation process
</SectionTitle>
      <Paragraph position="0"> The evaluation of the automatic extension of TERSEO to Italian has been performed in three steps. First of all, the system has been evaluated both against the training and the test corpora with two main purposes: Determining if the recall obtained in the evaluation of the training part of the corpus is a bit higher than the one obtained in the evaluation of the test part of I-CAB, due to the fact that in the TE collection phase of the extension, temporal expressions were extracted from this part of the corpus.</Paragraph>
      <Paragraph position="1"> Determining the performance of the automatically extended system without any manual revision of both the Italian translations and the resolution rules automatically related to the expressions.</Paragraph>
      <Paragraph position="2"> Secondly, we were also interested in verifying if the performance of the system in terms of precision could be improved through a manual revision of the automatically translated temporal expressions. null Finally, a comparison with a state of the art system for Italian has been carried out in order to estimate the real potentialities of the proposed approach. All the evaluation results are compared and presented in the following sections using the same metrics adopted at the TERN2004 conference. null  In the automatic extension of the system, a total of 1,183 Italian temporal expressions have been stored in the database. As shown in Table 5, these expressions have been obtained from the different resources available: ENG ITA: This group of expressions has been obtained from the automatic translation into Italian of the English Temporal Expressions stored in the knowledge DB.</Paragraph>
      <Paragraph position="3"> ESP ITA: This group of expressions has been obtained from the automatic translation into Italian of the Spanish Temporal Expressions stored in the knowledge DB.</Paragraph>
      <Paragraph position="4"> CORPUS: This group of expressions has been extracted directly from the training part  Both the training part and the test part of I-CAB have been used for evaluation. The results of precision (P), recall (R) and F-Measure (F) are presented in Table 6, which provides details about the system performance over the general recognition task (timex2), and the different normalization attributes used by the TIMEX2 annotation standard. As expected, recall performance over the training corpus is slightly higher. However, although the temporal expressions have been extracted from such corpus, in the automatic process of obtaining the normalization rules for these expressions, some errors could have been introduced.</Paragraph>
      <Paragraph position="5"> Comparing these results with those obtained by the automatic extension of TERSEO to English and taking into account the recognition task (see Table 4), precision (P) is slightly better for English (77% Vs. 72%) whereas recall (R) is better in the Italian extension (62% Vs. 83%). This is  due to the fact that in the Italian extension, more temporal expressions have been covered with respect to the English extension. In this case, in fact, Ita-TERSEO is not only using the temporal expressions translated from the English or Spanish knowledge database, but also the temporal expressions extracted from the training part of I-CAB.  A manual revision of the Italian TEs stored in the Knowledge DB has been done in two steps.</Paragraph>
      <Paragraph position="6"> First of all, the incorrectly translated expressions (from Spanish and English to Italian) were removed from the database. A total of 334 expressions were detected as wrong translated expressions. After this, another revision was performed. In this case, some expressions were modified because the expressions have some minor errors in the translation. 213 expressions were modified in this second revision cycle. Moreover, since pattern constituents in Italian might have different orthographical features (e.g. masculine/feminine, initial vowel/consonant, etc.), new patterns had to be introduced to capture such variants. For example, as months' names in Italian could start with a vowel, the temporal expression pattern &amp;quot;nell'-MONTH&amp;quot; has been inserted in the Knowledge DB. After these changes, the total amount of expressions stored in the DB are shown in Table 7.  manual revision.</Paragraph>
      <Paragraph position="7"> In order to evaluate the system after this manual revision, the training and the test part of I-CAB have been used. However, the results of precision (PREC), recall (REC) and F-Measure were exactly the same as presented in Table 6. That is not really surprising. The existence of wrong expressions in the knowledge database does not affect the final results of the system, as they will never be used for recognition or resolution. This is because these expressions will not appear in real documents, and are redundant as the correct expression is also stored in the Knowledge DB.</Paragraph>
      <Paragraph position="8">  language-specific system Finally, in order to compare Ita-TERSEO with a state of the art system specifically designed for Italian, we chose Chronos (Negri and Marseglia, 2004), a multilingual system for the recognition and normalization of TEs in Italian and English. Like all the other state of the art systems addressing the recognition/normalization task, Chronos is a rule-based system. From a design point of view, it shares with TERSEO a rather similar architecture which relies on different sets of rules. These are regular expressions that check for specific features of the input text, such as the presence of particular word senses, lemmas, parts of speech, symbols, or strings satisfying specific predicates. Each set of rules is in charge of dealing with different aspects of the problem. In particular, a set of around 350 rules is designed for TE recognition and is capable of recognizing with high Precision/Recall rates both explicit and implicit TEs. Other sets of regular expressions, for a total of around 700 rules, are used in the normalization phase, and are in charge of handling a specific TIMEX2 attribute (i.e. VAL, SET, ANCHOR VAL, and ANCHOR DIR). The results obtained by the Italian version of Chronos over the test part of I-CAB are shown in the last three columns of Table 6.</Paragraph>
      <Paragraph position="9"> As expected, the distance between the results obtained by the two systems is considerable. However, the following considerations should be taken into account. First, there is a great difference, both  in terms of the required time and effort, in the development of the two systems. While the implementation of the manual one took several months, the porting procedure of TERSEO to Italian is a very fast process that can be accomplished in less than an hour. Second, even if an annotated corpus for a new language is not available, the automatic porting procedure we present still remains feasible. In fact, most of the TEs for a new language that are stored in the Knowledge DB are the result of the translation of the Spanish/English TEs into such a target language. In our case, as shown in Table 5, more than 80% of the acquired Italian TEs result from the automatic translation of the expressions already stored in the DB. This makes the proposed approach a viable solution which allows for a rapid porting of the system to other languages, while just requiring an on-line translator (note that the Altavista Babel Fish translator9 provides translations from English to 12 target languages). In light of these considerations, the results obtained by Ita-TERSEO are encouraging.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML