File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/c94-1099_evalu.xml

Size: 6,843 bytes

Last Modified: 2025-10-06 14:00:14

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-1099">
  <Title>A TOOL FOR COLLECTING DOMAIN DEPENDENT SORTAL CONSTRAINTS FROM CORPORA</Title>
  <Section position="8" start_page="601" end_page="602" type="evalu">
    <SectionTitle>
6 EVALUATION AND RESULTS
</SectionTitle>
    <Paragraph position="0"> Evaluate the porting to a new domain require rneasuring how the new sort file contributes to perform the target task within the new domain. This kind of evaluation is difficult because it is hard to separate the contribution of the grammar and the contribution of the sorts constraints. One way to evaluate our tool would be to have a file of &amp;quot; correct&amp;quot; sortal constraints that we use as a reference to check the ones we generate with our tool. &amp;quot;rite problem is that this kind of file does not exist for new domMns, since obtaining such file is precisely the purpose of our tool.</Paragraph>
    <Paragraph position="1"> The approach we have chosen was to use the sort file built by hand for the ATIS corpus and to check this 'reference file' against the new sort file we intend to build, using our tool on a corl)us of the same domaine.</Paragraph>
    <Section position="1" start_page="601" end_page="601" type="sub_section">
      <SectionTitle>
6.1 Building the signature file
</SectionTitle>
      <Paragraph position="0"> For the this first experimental exercise with the sort tool, we built the signature file somewhat differently than we wonld build it for a new application. In order to facilitate evaluating tl,e tool, our goal this time was to come up with a signature file be compatible with the reference file built by hand.</Paragraph>
      <Paragraph position="1"> The tirst step in the experiment was to automatically extract the signatures from the lexicon and reference sorts file, which contains nearly 2200 sort definitions. Signatures are largely predictable from the grammatical category of a word 1&amp;quot;o,' example, most of the verbs (except the auxiliaries) with one argmnent, receiw'.d a signature identical to the sort definition. On the other \[laad, nlosl.</Paragraph>
      <Paragraph position="2"> of the prepositions received a signature with all their arguments replaced by a varial)h.' (since they are domain-specific). In this maiden voyage of the sort acquisition system, the signatures chosen for verbs, adjectives and nouns were made coml)atible with the sort hierarchy used by the reference sorts file. In porting to a new domain, the lexical signatures would presumahly use an automatically generated sort hierarchy, almost entirely fiat, with a unique lexical sort for each lexical item.</Paragraph>
      <Paragraph position="3"> In addition to this, some signatures, for logical predicates and predicates introduced in semantic rules, were added by hand. These represent a little bit more titan 15% of the final signature file which contains a total of 1357 signatures, llalf of these signatures are zero-arity predicates mostly automatically built from the lexicon.</Paragraph>
    </Section>
    <Section position="2" start_page="601" end_page="601" type="sub_section">
      <SectionTitle>
6.2 Parsing Madeow
</SectionTitle>
      <Paragraph position="0"> The next step of our experiment was to parse a corpus from the A'I'IS domain using the signature file we haw; Imilt. For this, we have used the MADCOW corpus\[4\], that includes 7{24:t sentences of various length (from 1 to 36 words) with a large linguistic coverage from this domain. This process had been done in both modes LFs anti PLI,'s. q'he idea was to compare the result in both modes, to check whether the use of parsing preferences was relewmt for the extraction of tile sort definitions or if we had to use all the Logical l,'orms from tile parsing.</Paragraph>
      <Paragraph position="1"> The first iteration of parsing MAI)COW In'Odated 5917 and 2275 sort rules a respectively for the LI,'s and PLFs modes.</Paragraph>
    </Section>
    <Section position="3" start_page="601" end_page="602" type="sub_section">
      <SectionTitle>
6.3 Mapping corpus and reference
rules
</SectionTitle>
      <Paragraph position="0"> For this first ewthmtion, we also used a feature of our tool which ran map each sort rule produced by the extraction phase against the rules of a reference sort file. 'i'he mapping consists of assigning one of the following categories to each corpus acquired sort rule :  identical t,o their sort rule, only sorts rules with at least an argmnmlt were extra(:ted during the parsing &lt;&gt;f MAI)COW.</Paragraph>
      <Paragraph position="1"> 4'l'wo sort rules are incomparable, whell they unify each other while none of them subsumes the other one.  Tim first comments concerning these figures is that the percentage of incompatible rules is higher for the LFs than the PLFs mode (respectively 52% vs 30%), and the number of 'exact' sorts is more than half for lAPs than PLFs. This shows that the use of Preferred Logical l&amp;quot;orms for parsing is more efl\]cient in extracting the 'good sorts'.</Paragraph>
      <Paragraph position="2"> tIowever, the figures do not give an exact idea of the completeness and precision of our tool, since there is a large number of rules sul}sumed by otlmr ones (more than 30% for I,Fs and almost 50% for PLies mode). In fact, some of tile corpus rules are subsuined by more general rules ill the reference sort file while providing the same coverage as the reference sort rules.</Paragraph>
      <Paragraph position="3"> Therefore, the precision of our tool fc)r the l'Ll&amp;quot;s mode just after the extraction phase can be estimated between 16% (exacts rules) and 55% (exact rules plus subsumed n\]les). This \[mml}er gets better and more precise very q,,ickly after the first iteration of editing since the work of the lingnist is precisely to remove most of the incompatible and incomparable rules and rules whi{:h are either to() general or too speciiic.</Paragraph>
      <Paragraph position="4"> The ovt,.rge.neration of the tool just after parsing, for the Pl,l,'s mode, can I)e estimated to at least 30% (the percentage of incorrect rules). After tile first iteration of editing, this number decreases very quickly since low probahilitles help the lingnist to eliminate rules that are incomI}atihie or ineomparable.</Paragraph>
      <Paragraph position="5"> The reeall for the Pl,Fs mode after parsing, which is the ratio of the 'Exact' corpus rules by the number of reference rules used for the mappillg in our evaluation (636 non zero-arity sorts rules), can be estimated to at least 57%.</Paragraph>
      <Paragraph position="6"> A more precise estimation of the exact ram&gt; bet of 'Exact' rules could be COmlmted by using the sortal hierarchy, and generate tbr the two sets of rules (corpus and reference) all the rules that can be subsumed, and realize the mal}plng only with these rules.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML