File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/e99-1052_evalu.xml

Size: 3,165 bytes

Last Modified: 2025-10-06 14:00:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="E99-1052">
  <Title>Determination of Syntactic Functions in Estonian Constraint Grammar</Title>
  <Section position="4" start_page="291" end_page="291" type="evalu">
    <SectionTitle>
3 Results
</SectionTitle>
    <Paragraph position="0"> To evaluate the performance of parser I use two types of corpora. Training corpus is used for formulating rules and preliminary testing. After testing I improve rules so that most errors will be fixed next time. Benchmark corpus is used only for evaluating parser. Both types of corpora consist of fiction texts. The training corpus contains 4 texts of 2000 words from different Estonian writers. Benchmark corpus consists of 2000 word. I used these corpora in two experiments. In the first experiment (experiment A) I tested only the syntactic function detecting part of grammar and I supposed that the input text is ideally morphologically analysed and disambiguated, this means that all words are morphologically correct and unambiguous. For this experiment both corpora were manually morphologically disambiguated. In the second experiment (experiment B) I used the same corpora but they were disambiguated automatically. In this case the disambiguator made 2% errors and left 13% of words ambiguous, 1% of words were unknown for morphological analyser.</Paragraph>
    <Paragraph position="1"> The precision and recall of ESTCG parser are shown in table 1.</Paragraph>
    <Paragraph position="2">  The big number of errors in B experiment can be explained by the fact that I wrote preliminary grammar rules using only manually disambiguated corpora and the work on correcting rules using more ambiguous input is still in process. As I mentioned before the input was ambiguous and erroneous in this experiment and this caused error rate of 3%.</Paragraph>
    <Paragraph position="3"> The errors in manually disambiguated corpora are mostly caused by ellipsis, some errors occurred during determination of apposition and the third biggest group of errors exists in sentences there one clause divides the other into two parts.</Paragraph>
    <Paragraph position="4"> In experiment A, 86-88% of words become syntactically unambiguous, and in experiment B, the .corresponding numbers are 80-82%. In both experiment less than 0,5% of words have 5-6 syntactic tags.</Paragraph>
    <Paragraph position="5"> It is very difficult to distinguish adverbial attributes and adverbials. Approximately 6% of analysed words have both labels. This is almost the same problem as PP-attachment in English but additionally it is possible to use both premodifying and postmodifying adverbial attributes in Estonian. Of course the PP-attachment problem is also existent. The other hard problem is the distinction of genitive attributes and objects. If two or more nouns in genitive case are situated side by side then these words remain usually ambiguous, e.g .... siis vabastab kohus tema vara hooldaja j~irelevalve alt. / ... then free-SG3 court-NOM he-GEN property-GEN trustee-GEN supervision-GEN from-POSTP / '... then the court frees his property from the supervision of trustee.'</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML