File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/a97-2016_metho.xml
Size: 5,156 bytes
Last Modified: 2025-10-06 14:14:34
<?xml version="1.0" standalone="yes"?> <Paper uid="A97-2016"> <Title>PPER VVFIN ADV PlAT NN $, KOUS ART NN ADJD VAFIN ,--Move: I &quot;'' II &quot;&quot;' I ~-deg to: I I I :~0 11 ;,o I\[\] ~_,,to. I :~degdeg II ;'degdeg I Mato&quot;es:deg -Dependency: Selection: | Command: I I Ex-ecute I $( ADV PlAT</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 The Annotation Scheme </SectionTitle> <Paragraph position="0"> Several features of the tool have been introduced to suite the requirements imposed by the architecture of the annotation scheme (cf. (Skut et al., 1997)), which can itself be characterised as follows: * Direct representation of the underlying argument structure in terms of unordered trees; * Rudimentary, flat representations; uniform treatment of local and non-local dependencies; * Extensive encoding of linguistic information in grammatical function labels.</Paragraph> <Paragraph position="1"> Thus the format of the annotations is somewhat different from treebanks relying on a context-free backbone augmented with trace-filter annotations of non-local dependencies. (cf. (Marcus et al., 1994), (Sampson, 1995), (Black et al., 1996)) Nevertheless, such treebanks can also be developed using our tool. To back this claim, the representation of structures from the SUZANNE corpus (cf. (Sampson, 1995)) will be shown in the presentation.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 User Interface </SectionTitle> <Paragraph position="0"> A screen dump of the tool is shown in fig. 1. The largest part of the window contains the graphical representation of the structure being annotated. The nodes and edges are assigned category and grammatical function labels, respectively. The words are numbered and labelled with part-of-speech tags. Any change into the structure of the sentence being annotated is immediately displayed.</Paragraph> <Paragraph position="1"> Extra effort has been put into the development of a convenient keyboard interface. Menus are supported as a useful way of getting help on commands and labels. Automatic completion and error check on user input are supported.</Paragraph> <Paragraph position="2"> Three tagsets have to be defined by the user: part-of-speech tags, phrasal categories and grammatical functions. They are stored together with the corpus, which permits easy modification when needed.</Paragraph> <Paragraph position="3"> The user interface is implemented in Tcl/Tk Version 4.1. The corpus is stored in an SQL database.</Paragraph> </Section> <Section position="4" start_page="0" end_page="27" type="metho"> <SectionTitle> 3 Automation </SectionTitle> <Paragraph position="0"> To increase the efficiency of annotation and avoid certain types of errors made by the human annotator, manual and automatic annotation are combined in an interactive way. The automatic component of the tool employs a stochastic tagging model induced from previously annotated sentences. Thus the degree of automation increases with the amount of data available.</Paragraph> <Paragraph position="1"> At the current stage of automation, the annotator determines the substructures to be grouped into a new phrase and assigns it a syntactic category. The assignment of grammatical functions is performed automatically. To do this, we adapted a standard part-of-speech tagging algorithm (the best sequence of grammatical functions is to be determined for a sequence of syntactic categories, cf. (Skut et al., 1997)) The annotator supervises the automatic assignment of function tags. In order to keep him from missing tagging errors, the grammatical function tagger is equipped with a function measuring the reliability of its output. On the basis of the difference between the best and second-best assignment, the prediction is classified as belonging to one of the following certainty intervals: Reliable: the most probable tag is assigned, Less reliable: the tagger suggests a function tag; the annotator is asked to confirm the choice, - Gen eral: corp.,: iR~!co,pus tes~op~. .. I FI Unreliable: the annotator has to determine the. function himself.</Paragraph> <Paragraph position="2"> The annotator always has the option of altering already assigned tags.</Paragraph> <Paragraph position="3"> The tagger rates 90% of all assignments as reliable. Accuracy for these cases is 97%. Most errors are due to wrong identification of the subject and different kinds of objects in S's and VP's. Accuracy of the unreliable 10% of assignments is 75%, i.e., the annotator has to alter the choice in 1 of 4 cases when asked for confirmation. Overall accuracy of the tagger is 95%.</Paragraph> <Paragraph position="4"> In several cases, the tagger has been able to abstract from annotation errors in training material, which has proved very helpful in detecting inconsistencies and wrong structures.</Paragraph> <Paragraph position="5"> This first automation step has considerably increased the efficiency of annotation. The average annotation time per sentence improved by 25%.</Paragraph> </Section> class="xml-element"></Paper>