File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-2002_evalu.xml

Size: 11,254 bytes

Last Modified: 2025-10-06 13:59:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-2002">
  <Title>Intelligent patent analysis through the use of a neural network: experiment of multi-viewpoint analysis with the MultiSOM model</Title>
  <Section position="6" start_page="2" end_page="4" type="evalu">
    <SectionTitle>
5. Evaluation
</SectionTitle>
    <Paragraph position="0"> The advantages of the MultiSOM method seem obvious to the expert of the domain: the original multiple viewpoints classification approach of MultiSOM tends to reduce the noise which is inevitably generated in an overall classification approach while increasing the flexibility and the granularity of the analyses. Moreover, with a global classification method, like WEBSOM, important relationships between some subtopics are hidden in the class profiles and therefore very difficult to precisely characterize. The expert found more than 35 of such important relationships by the use of the MultiSOM method. A simple example is given by the comparison of the figure 3 and the figure 5.</Paragraph>
    <Paragraph position="1"> Other examples of more elaborated topic relationships that can be only obtained by the MultiSOM inter-map communication mechanism are given in the annex of the paper. Finally, the expert argued that the possibility of interactively activating, positively or negatively, the classes on the maps represents a great help for tuning very precisely an analysis process. Nevertheless, expert empirical evaluation remains insufficient to objectively compare global approach to viewpoint-oriented approach. For this last purpose, we propose new objective classification quality estimators for both evaluating and optimising the results of the classification and of the mapping methods, especially when they are applied in the domain of documentary databases. These estimators are described in the next section.</Paragraph>
    <Paragraph position="2">  represents the WEBSOM-like mapping (i.e. without viewpoint management) of the content of the patent abstracts. The right part of the map represents the description (i.e. profile) of the &amp;quot;extending oil life&amp;quot; WEBSOM global topic. Even if a strong relationship between &amp;quot;extending oil life&amp;quot; and &amp;quot;black sludge control&amp;quot; topics has been highlighted by Profile of topic: Extending oil life the MultiSOM viewpoint-oriented classification (see map of figure 3), this relationship has been lost by the WEBSOM-like classification due to the noise of the global classification (this relationship do not appear, neither in the above map, nor in the &amp;quot;extending oil life&amp;quot; topic profile).</Paragraph>
    <Section position="1" start_page="2" end_page="3" type="sub_section">
      <SectionTitle>
5.1 Evaluation procedure
</SectionTitle>
      <Paragraph position="0"> When anyone aims at comparing classification methods, he will be faced with the problem of choice of reliable classification quality measures.</Paragraph>
      <Paragraph position="1"> The classical evaluation measures for the quality of a classification are based on the intra-class inertia and the inter-class inertia [16][17][25]. Thanks to these two measures, a classification is considered as good if it possesses low intra-class inertia as compared to its inter-class inertia. However, in the case of a Kohonen classification, as well as for many other numerical classification methods, these measures are often strongly biased, mainly because the intrinsic dimensions of the classes profiles (number of non-zero components in the profiles) are not of the same order of magnitude than the intrinsic dimensions of the data profiles  . It is especially true in the documentary domain where the number of indexes in the documents is extremely low as compared to the dimension of their overall description space.</Paragraph>
      <Paragraph position="2"> A promising way we have found in order to more precisely highlight the main characteristics of the classes of the map and to validate the thematic deductions between the maps consists in coupling the MultiSOM model with a symbolic model using Galois lattice conceptual classification of the patents regarding the same viewpoints as the one used for the map building. This approach is extensively described in [31]. A Galois lattice model could also be considered as a pure natural elementary classifier. Indeed, it groups the data by directly considering their intrinsic properties (i.e. without any preliminary construction of class profiles). Hence, one might derive from its behavior news class quality evaluation factors which can be substituted to the measures of inertia for validating the intrinsic properties of the numerical classes. For the sake of user-orientation, our measures will be based in a parallel way on the recall and precision criteria which are extensively used from evaluating  In the SOM method, a second bias is generated by the class construction process that tends to maintain the topographic properties of the map by enhancing the similarities between neighboring classes. the result quality of information retrieval (IR) systems. In IR [29], the Recall R represents the ratio between the number of relevant documents which have been returned by an IR system for a given query and the total number of relevant documents which should have been found in the documentary database. The Precision P represents the ratio between the number of relevant documents which have been returned by an IR system for a given query and the total number of documents returned for the said query. Recall and Precision generally behave in an antagonist way: as Recall increases, Precision decreases, and conversely. The F function has thus been proposed in order to highlight the best compromise between these two values [35]. It is given by:</Paragraph>
      <Paragraph position="4"> Based on the same principles, the Recall and Precision measures which we introduce hereafter evaluate the quality of a classification method by measuring the relevance of its resulting class content  in terms of shared properties. In our further descriptions, the class content is supposed to be represented by documents and the indexes (i.e. the properties) of the documents are supposed to be weighted by values within the range[]1,0 .</Paragraph>
      <Paragraph position="5"> Let us consider a set of classes C resulting from a classification method applied on a set of documents D, the Recall measure is expressed as:  The content of a class is represented by the subset of original data that have been associated to it by the classification process. extracted from the classes of C, which verifies:  W represents the weight of the property p for element x.</Paragraph>
      <Paragraph position="6"> Similarly to IR, the F-measure (described by Eq. 1) could be used to combine Recall and Precision results. Moreover, we have demonstrated in [16] that if both values of Recall and Precision reach the unity value, the peculiar set of class C represents a Galois lattice. Therefore, the combination of this two measures enables to evaluate to what extent a numerical classification model can be assimilated to a Galois lattice natural classifier. The stability of our Quality criteria has also been demonstrated in [16].</Paragraph>
    </Section>
    <Section position="2" start_page="3" end_page="4" type="sub_section">
      <SectionTitle>
5.2 Evaluation results
</SectionTitle>
      <Paragraph position="0"> from 1, the better are the classification results. The F value provides a synthesis of the results of R and P.</Paragraph>
      <Paragraph position="1"> The examination of the Quality measures of the table 2 gives more reliable and stable results because these measures are both independent of the classification method and of the size of the description space. It highlights the overall superiority of the viewpoint-oriented approach as compared with a global approach with the same number of class (GlobMin). As the number of classes is strongly increased in the global approach (GlobMax), its quality is simultaneously increased, but the advantage of the viewpoint-oriented approach remains obvious in the average (higher Average F-value on all viewpoints than F-value of GlobMax), with a more reasonable number of classes per maps from a user point of view. The specific case of the Title classification should be discussed here. The bad quality of this classification is both due to the index sparseness of this field  and to an inappropriate number of classes, relatively to  This can be &amp;quot;a posteriori&amp;quot; confirmed by the inertia results for this viewpoint.</Paragraph>
      <Paragraph position="2"> the size of its associated description space. An interesting strategy would then be to make use of the quality factor Q in order to find the optimal number of classes for this classification. An unbalance between Recall and Precision (in the favour of Recall) can be observed in the case of the worse classifications (GlobMin and Titles). Such an unbalance means that documents with different properties sets are grouped in the same classes, leading conjointly to the risk of confusion in the interpretation of the content of the classes by the user.</Paragraph>
      <Paragraph position="3"> The quality analysis clearly shows that the viewpoint-oriented approach enhance the quality of interpretation of a classification by both reducing the number of class to be consulted by the user on each viewpoint and providing him with more coherent and exhaustive classes in terms of content.</Paragraph>
    </Section>
    <Section position="3" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
5.3 Optimisation of classification results
</SectionTitle>
      <Paragraph position="0"> The quality criteria that have been presented in the latter section can also be used for optimizing the number of classes for each viewpoint map. The goal of this process is to provide the analyst with an optimal quality of interpretation for each individual map associated to a specific viewpoint. For that purpose, different maps are generated from 6x6 to 24*24 nodes (classes) for each viewpoint. The principle of our algorithm of classification optimisation, which is described in [16], is to search for a break-even point (i.e. intersection point) between Recall and Precision. The map whose quality criteria stand the nearest from the break-even point is considered as the optimal one. The figure subjectively illustrates the difference of accuracy that can be obtained in the analysis by optimizing the map size for a given viewpoint. As it is shown in the figure 6, high quality maps are usually characterized by more precise topic labels  map through map extracts: the 11x11 map extract is presented at the left, the 16x16 map extract is presented at the right. On the figure, the focus is given &amp;quot;machine oil&amp;quot; topic. The comparison highlights, as an example, that the logical surrounding of this topic is more precisely defined in the 16x16 map (optimal quality) than in the 11x11 map (lower quality). Moreover, in the 11x11 map, the topic &amp;quot;machine oil&amp;quot; has been derived in a more fuzzy scope topic named &amp;quot;machine and vehicles&amp;quot;.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML