File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-1507_intro.xml

Size: 2,892 bytes

Last Modified: 2025-10-06 14:02:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1507">
  <Title>Categorial Type Logic meets Dependency Grammar to annotate an Italian Corpus</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 PoS tagging for Italian
</SectionTitle>
    <Paragraph position="0"> Before embarking on our first task, we have studied the current situation with respect to PoS tagging for Italian. Italian is one of the languages for which a set of annotation guidelines has been developed in the context of the EAGLES project (Expert Advisory Group on Language Engineering Standards (Monachini, 1995)). Several research groups have worked on PoS annotation in practice (for example, Torino University, Xerox and Venice University).</Paragraph>
    <Paragraph position="1"> We have compared the tag sets used by these groups with Monachini's guidelines. From this comparison, it results that though there is a general agreement on the main parts of speech to be used3, considerable divergence exists when it comes to the actual classification of Italian words with respect to these main PoS classes.</Paragraph>
    <Paragraph position="2"> The classes for which differences of opinion are most evident are adjectives, determiners and adverbs. For instance, words like molti (tr.</Paragraph>
    <Paragraph position="3"> many) have been classified as &amp;quot;indefinite determiners&amp;quot; by Monachini, &amp;quot;plural quantifiers&amp;quot; by Xerox, &amp;quot;indefinite adjectives&amp;quot; by the Venice and Turin groups. This simple example shows that the choice of PoS tags is already influenced by the linguistic theory adopted in the background.</Paragraph>
    <Paragraph position="4"> This theoretical bias will then influence the kind of conclusions one can draw from the annotated corpus.</Paragraph>
    <Paragraph position="5"> Our aim is to derive an empirically founded PoS classification, making no a priori assumptions about the PoS classes to be distinguished. Our background assumptions are minimal and, we hope, uncontroversial: we assume that we have access to head-dependent (H-D) and functor-argument (F-A) relations in our material. We encode the H-D and F-A information into categorial type formulas. These formulas then serve as &amp;quot;labels/tags&amp;quot; from which we obtain the desired empirically founded PoS classification by means of a clustering algorithm. To bootstrap the process of type induction, we transform the TUT corpus into a simplified dependency treebank. The transformation keeps the bare dependency relations but removes the more theory-laden annotation. In Section 4, we describe how we use the simplified dependency treebank for our distributional study of Italian PoS classification. First, we briefly look at H-D and F-A relations as they occur in the TUT corpus and in Categorial Type Logic (CTL).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML