File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/c94-1102_intro.xml

Size: 9,296 bytes

Last Modified: 2025-10-06 14:05:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1102">
  <Title>Non-directionality and Self-Assessment in an Example-based System Using Genetic Algorithms</Title>
  <Section position="3" start_page="616" end_page="618" type="intro">
    <SectionTitle>
2.1. Data struet;ures
</SectionTitle>
    <Paragraph position="0"> The board data structure \[Vauquois and Chaplmy 85\] w~s introduced ~s an answer to the problem of specification of real-size graminars. A board is the associa.</Paragraph>
    <Paragraph position="1"> tion of a text with its corresponding, linguistic structure.</Paragraph>
    <Paragraph position="2"> Moreover, constraints express the linguistic validity of the fine-grained correspondences between different parts of the texts and of the structure \[Better and Zaharin 88\], \[Zaliarin and Lepage 92\]. As I a particular case, project.iw~ constituency boards such as l,'igure 2 verify these cons{.raill ts, Boards would be of little nse if they would not allow the description &lt;)f pattei'lls, llence, Figure 3 is also a .............. l ..... mod~l NP VP  I .......... I ....</Paragraph>
    <Paragraph position="3"> pron AVP verb NP I ..... I ....</Paragraph>
    <Paragraph position="4">  valid board. It is similar to Figure 2, except that portions of the string and the tree have been replaced by variables (prefixed by a $ sign). These variables stand for tbrests, not only for trees -- tile point is important. Because it is always better to look for a unified view of  objects, the string part and tim tree part are considered to be of the same data type, that of forest. As a matter of fact, a string is a forest with only o.lle level, and a tree is a forest with only one node on the highest level. Now, as forests are the underlying data type, variables stand naturally for subforests. On the string side, considering Variables as forests is by far more interesting than if they would instantiate with one word only.</Paragraph>
    <Paragraph position="5"> An interesting property about the board data structure, and it is exactly why it has been devised, is that, because it is the association of a string (the text) and a (linguistic) tree, it is neutral with respect to the main natural language processing operations: i analysis (input: string, output: tree); * generation (input: tree, output: string).</Paragraph>
    <Paragraph position="6">  Our database of sentences is that of ATR telephone conversations. 'l?liese dialogues are telephone conversations for a scenario where somebody calls at secretari. , to get, information about a coining conference he would like to atttend, l,'igure 4 is an excerpt from tliese dialogues. Ilello.</Paragraph>
    <Paragraph position="7">  - This is the Con/e,'enee OJJice.</Paragraph>
    <Paragraph position="8"> - Could ~jou tell me (:bruit the attendance fee for the Conference? If I at,ply for the (2on.ference now, how much is the atten&amp;mce fee7 - Yes. At pvese~lt the otte*tdailee Jee is 35,000 yell per  person. IS yell appbj ne:ct month, it will be 40,000 yeF~. l,'igure ,1: An excerpt \['roill the A'\['IT, dialogues We kepl. I0 of these dialogues hi English. 'l'his rel;rc',&lt;;e li I,S ') &amp;quot; &amp;quot; .,.it g('lltellCeS of ';\'hich 1.'30 are dill'erelll..  The linguistic structures corresponding to the previous sentences have been drawn by hand and scrupulously reviewed to ensure consistency. They are syntactic constituency trees and are exactly projective, which means that each leaf in the tree corresponds to a word in the sentence in the same order.</Paragraph>
    <Paragraph position="9"> As for illustration, all the trees and sentences in this paper are extracted from our data base of boards. Some representational choices have been made to limit the number of morpho-syntactic categories to 14 (and phrase types to 7) and to keep projectivity by all means.</Paragraph>
    <Section position="1" start_page="617" end_page="618" type="sub_section">
      <SectionTitle>
2.2 Pnnetions
2.2.1 Fitness = Distance between forests
</SectionTitle>
      <Paragraph position="0"> We define the fitness of an element in a population (set of boards) ms the distance to a given input (a board) to the system. In other words, we have to define a distance between boards. A simple idea is to take the sum of the distances between the strings on the one band, and the trees on the other hand. As strings and trees are forests a distance on forests is required.</Paragraph>
      <Paragraph position="1"> The definition of a distance on forests is given below, with a, b being nodes, u, u ~, v, .v ~ being forests and .</Paragraph>
      <Paragraph position="2"> denoting concatenation of forests.</Paragraph>
      <Paragraph position="4"> It is a direct generMisation of two classical distances on strings \[Wagner &amp; Fischer 74\] and trees \[Selkow 77\].</Paragraph>
      <Paragraph position="5"> Both distances answer the correction problem: what is the minimal number of typing operations needed to transform one object into the other one? In both distances and their generalisation to forests, the typing operations are insertion, deletion and replacement An extension of the previous distance to forest patterns (i.e. forests containing variables) has been presented in \[Lepage etal. 92\]. It is no longer a metric, so We call it a proximity score. With this score, the distance between a variable and a constant object is zero by definition. Figure 5 gives an illustration (the unit is a one word or node difference).</Paragraph>
      <Paragraph position="6">  We turn now to crossover. The first question is how b,_..(Is a\]re selected in a population for crossover.</Paragraph>
      <Paragraph position="7"> It seems reasonable that those individuals with better fitness value should intervene more in the production of the next generation. Along this line, the simple following law gives the probability of a board i with fitness fl (some reciprocal of distance) to be selected for crossover: f~ Pi = As for crossover itself, it has to be defined on strings and on trees.</Paragraph>
      <Paragraph position="8"> On strings, be they chromosomes or sequences of bii,s, crossover is generally performed as ilhlstrated in Figure 1. We could crossow'.r sentences following this sitnple l,riitclple (see Figure 6).</Paragraph>
      <Paragraph position="9"> be:fore crossover after crossover &amp;quot;Th~nk you vory much.&amp;quot; &amp;quot;Thank you halp you.&amp;quot; &amp;quot;May I help you?&amp;quot; iiMtty I very mtlch,&amp;quot;  But we insist on keeping the unity of data structure between strings and trees. So, we translate string crossover into forest terms: it is the exchange of the sister forests of the crossover points. This can be applied directly to trees, see Figure 7. This technique is different from the exchange of subtrees ~s proposed in \[Koza 92\]. before crossover after crossover  marked by *) Now, by keeping projectivity during crossover, only corresponding parts of strings and trees will be exchanged. As a consequence, string crossover will allow exchange of |ruler substrings. To stun up, a board obtained by crossover will give a partially valid description of a possibly ungrammatical sentence (see Figure 8).</Paragraph>
      <Paragraph position="11"/>
      <Paragraph position="13"> If an input board is given to Cite system, each board in the data base of examples carl be assigned a litness score: its distance to the input board.</Paragraph>
      <Paragraph position="14"> * When the input is a board where the Iingttistic tree is unknown (a variable), the output will be tim closest board containing the closest sentence with its ,associated tree. This is a ldnd of analysis.</Paragraph>
      <Paragraph position="16"> * When the input is a board where the string is unknown (a variable), the output will be the closest board containing the closest tree with its associated string. This is a kind of gencralion.</Paragraph>
      <Paragraph position="17">  * When the input is a board where both the sentence and the linguistic tree are partially specified (they contain variables), the otltl)ttC will be Che closest board containing a complete sentence and its con&gt; plete associated linguistic structure.</Paragraph>
      <Paragraph position="19"> &amp;quot;13 help you $4&amp;quot; &amp;quot;flay I help you?&amp;quot; ...............................................</Paragraph>
      <Paragraph position="20"> inpul; output We call tile last operatio\[l 7lon-dircclioTlal complelion. In fact, analysis and generation are only parCicuhu&amp;quot; cases of this operation. For in.stance, analysis is 11011directional completion for a board will+ no w~riable iu the string part, and a w~riable as the tree parC.</Paragraph>
      <Paragraph position="21"> For each operation above, the external behaviour of tlt+.' system may be considered dilrerent, althotLt,:h the interttal behavio,jr is exactly tile san,,~. !n al',y ('a.'., L Cite ..o'Utlmt is a board, built from pieces of the data base boards, and minimising the distance to the input, lC is important to stress the point that the ini)ut never enters Che data base of board:;. It is only used to compute Cite titness of each board in the data base in each generation.</Paragraph>
      <Paragraph position="22"> Figure 9 sunmlarises tile system and its functioning.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML