File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/80/p80-1029_abstr.xml

Size: 9,140 bytes

Last Modified: 2025-10-06 13:45:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="P80-1029">
  <Title>STPJkTEGX?. SELECTION FOR AN ATN STNTACT~C PARSER</Title>
  <Section position="1" start_page="0" end_page="52143" type="abstr">
    <SectionTitle>
STPJkTEGX?. SELECTION FOR AN ATN STNTACT~C PARSER
</SectionTitle>
    <Paragraph position="0"> Giacomo Ferrari and Oliviero Stock Istituto dl tingulstica Computazionale - C~TR, Plsa Performance evaluation in the field of natural language processing is generally recognised as being extremely complex. There are, so far, no pre-established criteria to deal x~th this problem.</Paragraph>
    <Paragraph position="1"> I. It is impossible to measure the merits of a grammar, seen as the component of an analyser, in absolute terms. An &amp;quot;ad hoc&amp;quot; grammar, constructed for a limited set of sentences is, without doubt, more efficient in dealing with those particular sentences than a zrammer constructed for a larger set.</Paragraph>
    <Paragraph position="2"> Therefore, the first rudimentary criterion, when evaluating the relation~hlp between a grammar and a set of sentences, should be to establish whether this grammar is capable of analysing these sentences. This is the determination of linguistic coverage, and necessitates the definition of the linguistic phenomena, independently of the linguistic theory which has been adopted to recognise these phenomena.</Paragraph>
    <Paragraph position="3"> 2. In addition to its ability to recognise and coherently describe linguistic phenomena, a grammar should be Judged by its capacity to resolve ambiguity, to bypass irrelevant errors in the text being analysed, and so on. This aspect of a grammar could be regarded as its &amp;quot;robustness&amp;quot; \[P.Nayes, R.Reddy 1979\].</Paragraph>
    <Paragraph position="4"> 3. Examining other aspects of the problem, in the analysis chat we propose we will assume a grammar which is capable of dealing with the texts which we will submit to it.</Paragraph>
    <Paragraph position="5"> Let an ATN grammar tl, vlth n nodes, be of this type. N will be maintained constant for the following discussion.</Paragraph>
    <Paragraph position="6"> BY text we intend a series of sentences, or of utterances by one of the speakers in a dialogue. When analysing such a text, once a constant N has been assumed, it is likely that, in addition to the content (the arglm~ent of the discourse) indications will appear on the grammatical choices made by the author of the text (or the speaker) when expressing himself on that argument (how the argument is expressed).</Paragraph>
    <Paragraph position="7"> When these indications have been adequately quantified, they can be used to correctly select the perceptive strategies (as defined in \[Kaplan 72\]) to be adopted in order to achieve greater efficiency in the analysis of the following part of the text.</Paragraph>
    <Paragraph position="8"> 4. For our experiments we have used ATNSYS \[Stock 76\], and an Italian grammar with n - 50 (127 arcs) \[Cappelli st at.77\]. In this system, search is depth-first and the parser Interacts with a heuristic mechanism which orders the arcs according to a probability evaluation. This probability evaluation is dependent on the path which led to the current node and is also a function of the statistical data accumulated during previous analyses of a &amp;quot;coherent&amp;quot; text.</Paragraph>
    <Paragraph position="9"> The mechanism can be divided into two stages. The first stage consists of the acquisition of statistical data; i.e, the frequency, for each arc exiting from a node, of the passages across that arc, in relation to the arc of arrival: for each arriving arc there are as many counters as there are exiting arcs.</Paragraph>
    <Paragraph position="10"> f {e)-*x. f (b}:y ~f{,):w. f(b),* Fig. 1 In this way, in Fig. I arc 1 has been crossed x times coming from a and y times coming from b. In the second stage, during parsing, in state S, if coming from a and w &gt; x, arc 2 is cried first.</Paragraph>
    <Section position="1" start_page="113" end_page="113" type="sub_section">
      <SectionTitle>
4.1 Thus, a first evaluation of the linguistic choices made is provided by the
</SectionTitle>
      <Paragraph position="0"> set of probability values assocla~ed to each arc. These figures can to some extent describe the &amp;quot;style&amp;quot; of any &amp;quot;coherent&amp;quot; text analysed. (For this one should also take into account the different lin~uistlc significance of each arc.</Paragraph>
      <Paragraph position="1"> In fact a CAT or PUSH arc directly corresponds to a certain linguistic component, while a JUMP or VIRT arc occurs in relation to the technique by which the network has been built, the linguistic theory adopted, and other variables.)</Paragraph>
    </Section>
    <Section position="2" start_page="113" end_page="52143" type="sub_section">
      <SectionTitle>
4.2 The second part of the mechanism, ~he dynamic reordering of the arcs,
</SectionTitle>
      <Paragraph position="0"> coincides with a reordering of the co~prehension strategies. In this way, a matrix can be associated to each node, giving the order of the strategies for each arc in arrival.</Paragraph>
      <Paragraph position="1"> For each text T, there is a set of strategies ~ ordered as describod above. While the analysis of the probability values for distinct texts T and T&amp;quot; can give global indications of their lln~ulstlc characteristics, if we focus on ~he comprehension of the sentence, it is more meaningful to give nvaluatlons in relation to the sets of strategies, ~T and ~ , which are selected.</Paragraph>
      <Paragraph position="2"> Fig. 2 shows , for some nodes, a comparison between the orders of the arcs for the first Ii sentences from two texts, a science fiction nnvel (SFN, upper boxes) and a handbook of food chemistry (FC, lower boxes). The arc numbers are referred ~o the order in the original network. The figures which appear after the - in the heading indicate the number of parses for each sentence. An ec~cy box indicates the same order as that shown in the previous box.</Paragraph>
      <Paragraph position="3"> S b/,S2 1~ 1~</Paragraph>
    </Section>
    <Section position="3" start_page="52143" end_page="52143" type="sub_section">
      <SectionTitle>
5.1 It is to be expected that thls mechanism, in an far as it Intrnduces a
</SectionTitle>
      <Paragraph position="0"> heuristics, will increase the efficiency of the system used for the linguistic analysis. The results of our experiments so far confirm this. This ir~roved efficiency can be measured in three ways: a) locally, in terms of the computational load, due to non-determinism, ~ich is saved in each node. In fact, by some experiments, it is possible to quantify the computational load of each type of arc. The computational load of a node is then a linear combination of these values and one can comgare it with the actual load determined by the sequence of arcs attempted in that point after the reordering.</Paragraph>
      <Paragraph position="1"> b) in terms of an overall reduction in computing time; C) in terms of penetrance, i.e. the ratio between the number of choices which actually lead to a solution and the total number of choices wade.</Paragraph>
    </Section>
    <Section position="4" start_page="52143" end_page="52143" type="sub_section">
      <SectionTitle>
5.2 If T is a text containing r sentences, the average penetrance will be:
</SectionTitle>
      <Paragraph position="0"> o .=..', where ~ stands for each of the sentences in T. If T is analysed using the set of strategies chosen for a different text, T deg, then the penetrance is, on average, no greeter than with~ T *  In our experiments, for instance, the avera~_e oenetrance for the first text (SFN) parsed with its own strategies (~s##) is ~ed,SFN) = 0.52, while parsed with the strategies of the second text (Sty) is ~(5~,SFN) = 0.39. We have attempted to evaluate experlmentallv the relationship between the difference of the average penetrances, which we call dlscrepanc7 and the distance between two sets of strategies. However we think we need more experimentation before formalizing this relationship.</Paragraph>
      <Paragraph position="1"> Returning to our science fiction novel, the discrenanc- using its set of strategies and the one inferred by the food chemistry text is 6. In addition to the definition of a heuristic mechanism which is capable of in~rovinE the efficiency of natural language processing, and which can be evaluated as described above, our research aims at providing a means to chsracterise a text by evaluating the ~ramr~atical choices made by the author while expressing his argument.</Paragraph>
      <Paragraph position="2"> We are also attemptin~ to tako into account the expectations of the listener. In our opinion, the listener's expectations are not limited to the argument of the discourse but are also related to the way in which the argument is expressed; this is the equivalent of the choice of a sdb-grammar \[Kittredge 7~\] We intend to verify the existence of such expectations not only in literature or x~hen listening to long speeches, but also in dialogue.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML