File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2018_intro.xml
Size: 1,562 bytes
Last Modified: 2025-10-06 14:03:44
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-2018"> <Title>Using Machine-Learning to Assign Function Labels to Parser Output for Spanish</Title> <Section position="4" start_page="136" end_page="136" type="intro"> <SectionTitle> 2 The Spanish Treebank </SectionTitle> <Paragraph position="0"> As input to our LFG annotation algorithm we use the output of Bikel's parser (Bikel, 2002) trained on the Cast3LB treebank (Civit and Mart'i, 2004).</Paragraph> <Paragraph position="1"> Cast3LB contains around 3,500 constituency trees (100,000 words) taken from different genres of European and Latin American Spanish. The POS tags used in Cast3LB encode morphological information in addition to Part-of-Speech information. Due to the relatively flexible order of main sentence constituents in Spanish, Cast3LB uses a flat, multiply-branching structure for the S node. There is no VP node, but rather all complements and adjuncts depending on a verb are sisters to the gv (Verb Group) node containing this verb. An example sentence (with the corresponding f-structure) is shown in Figure 1.</Paragraph> <Paragraph position="2"> Tree nodes are additionally labelled with grammatical function tags. Table 1 provides a list of function tags with short explanations. Civit (2004) provides Cast3LB function tag guidelines.</Paragraph> <Paragraph position="3"> Functional tags carry some of the information that would be encoded in terms of tree configurations in languages with stricter constituent order constraints than Spanish.</Paragraph> </Section> class="xml-element"></Paper>