File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2205_metho.xml
Size: 8,059 bytes
Last Modified: 2025-10-06 14:13:50
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-2205"> <Title>Portuguese Analysis with Tree Adjoining Grammars</Title> <Section position="2" start_page="0" end_page="7257" type="metho"> <SectionTitle> 2. Tree Adjoining Grammars </SectionTitle> <Paragraph position="0"> Tree Adjoining Grammars were first described by \[JOSHI 75\], as a tree based system, where the basic component is a set of elementa,3, trees. Each tree represents a minimal linguistic structure and is a domain of locality. A TAG comprises two kinds of elemental 3, trees: initial trees, which are complete structures, with pre-terminals on the leaves; auxiliary trees, which must have exactly one leaf node with the same syntactic category of the root node.</Paragraph> <Paragraph position="1"> The elementary trees localize dependencies, like agreement, sub categorization, etc. and must have at least one terminal node.</Paragraph> <Paragraph position="2"> Sentences generated from a language defined by a TAG can be derived by the composition of an initial tree and elementary trees, through two operations: substitution and adjunction. Substitution, as showed in Fig 1, inserts an initial tree (oi- a tree derived fiom an initial tree) on tile correspondent leaf node in the elementary tree.</Paragraph> <Paragraph position="3"> Adjunction, as showed in Fig 2, inserts an auxiliary tree on the correspondent node in an elementary or derived tree.</Paragraph> <Paragraph position="5"> The adjunction operation can be recursive, then an auxiliary tree can receive adjunction in itself. Adjunction allows an insertion of a complete structure on a node of another complete structure.</Paragraph> <Paragraph position="6"> Adjunction makes TAGs a little more powerful then Context-Free Grammars (CFG), placing it in a class of grammars called Midly Context-Sensitive Grammars \[JOSHI 85\]. This operation preserves the dependencies among unbounded structures of the sentence.</Paragraph> <Paragraph position="7"> 3. Portuguese analysis with TAGs Several research groups are working with Tree Adjoining Grammars. There are descriptions of grammars for French \[ABEILLE 91\], English \[SCHABES 88\], a study for German \[RAMBOW 92\], among other languages.</Paragraph> <Paragraph position="8"> In order to analyze Portuguese language, there are many studies being developed, in Brazil and Portugal, which approach different formalisms. These researches focus punctual areas as lexical analysis \[COURTIN 89\], data-base queries using natural language \[BIGOLIN 93\], semantic analysis \[FREITAS 93\] \[LUZ 93\], etc.</Paragraph> <Paragraph position="9"> In TAG tbrlnalism we can find aspects that help syntactic analysis of Portuguese, tbr example, the possibility to have unboundness dependencies, such as agreement, among nodes. Jog\[o, quc fala porhlgu~s, csluda informfitica. l We are working on a grammar to describe Portuguese, and we are developing a syntactical analyzer for this grammar. One of the problems we t~aced was the absence of a description of the most common structures used for our language, something as &quot;fimdamental Portuguese&quot;, so we selected the subset to work with.</Paragraph> <Paragraph position="10"> We decided by a large subset, which includes active and passive voice, relative and interrogative clauses, auxiliary and support verbs, and clitic pronouns.</Paragraph> <Paragraph position="11"> The syntactical categories included are verbs, nouns, pronouns, adjectives, adverbs, articles and prepositions. For each one of the categories there are syntactical traits associated like: concrete, abstract, number, gender, person, mode, voice, ...</Paragraph> <Paragraph position="12"> The grammar is organized according to the formalism, using initial trees and auxiliary trees to describe surface structures of Portuguese language. These study was based on Portuguese normative grammars \[ROCHA LIMA 92\], and generative grammars \[LOBATO 86\]. Its important to observe that each one of the nodes associated to a tree has traits used t'or unification, and can have dependency traits between unbounded nodes. These dependency traits are kept under an adjunction operation.</Paragraph> <Paragraph position="13"> The first version of the syntactical analyzer, based upon TAGs, includes the acquisition of elementary trees, input of the sentence to be analyzed, construction era solution tree (made by adjunctkm and substitution), and unification of the input sentence with the solution tree. Note that the analyzer must return all the derived trees tbr the given input sentence. The elementary trees are supposed to contain intbrmation about the hierarchy of the nodes, type of that tree (relative, interrogative,...), operations that can be made on each node, and traits to be unified Syntactical analyzer input sentence comes fl'om a morphological analyzer that splits this sentence in components such as words or expressions, associating them a set of traits. Construction of the deriw,'d tree is made by adjunction and substitution operations over elementary trees. Unification compares traits of the input sentence with the traits described on TAG trees, producing the resulting trees.</Paragraph> <Paragraph position="14"> Inclusion of semantic traits will allow us to upgrade this analyzer in a semantic-syntactic analyzer, anticipating evahmtion of semantic traits to syntactical analysis, reducing the number of resulting trees.</Paragraph> </Section> <Section position="3" start_page="7257" end_page="7257" type="metho"> <SectionTitle> 4. Final remarks </SectionTitle> <Paragraph position="0"> In the scope of a project aiming to develop tools to treat Portuguese at morphological, syntactic and semantic levels, we started with lnorphological level, and we calne to an implementation of a robust lexical-morphological analyzer through trie trees \[STRUBE DE LIMA 93\]. As a next step, we approached syntactical level looking for a tbrmalism adequate to support Portuguese language. A large subset ot' this language was outlined, which should give rise to an experiment of implementation of algorithms and data structures for parsing Portuguese.</Paragraph> <Paragraph position="1"> This seems to be the fhst study using Tree Adjoining Granunars for Portuguese language. Our contribution would state on description ol'a large subset of the language, construction of&quot; trees that represent syntactic structures for Pomtgucse, and development of a parser, according to the formalism.</Paragraph> <Paragraph position="2"> We described around 300 inicial trees in order to cover the subset outlined, and developed a bottom-up LR parser working efficiently. We are now studying complementary data structures as a syntactical dictionary in order to improve the parser. This dictionary would be hcll)ful to construct the solution tree, searching \['astly the trees that can be used tbr a word. We are also adapting the output of the morphological analyzer in a model that fits the input of the syntactical analyzer developed.</Paragraph> <Paragraph position="3"> Tree Adjoining Grammars formalism, to this moment, seems to present aspects that benefit treatment of Portuguese language in a robust way. Acquisition of new trees can be made easily, as well as describing semantic traits together with the syntactical ones.</Paragraph> <Paragraph position="4"> Bibliography \[ABEILLE 91\] ABEILLE, Anne. &quot;Une Grammaire Lexicalisde d'Arbres Adjoinls pour le Franqais Application /l l'analyse automatique&quot;. Th6se de Doctorat de linguistique. Universitd Paris 7, LADL, Janvier, 1991.</Paragraph> </Section> <Section position="4" start_page="7257" end_page="7257" type="metho"> <SectionTitle> \[BIGOLIN 93\] </SectionTitle> <Paragraph position="0"> BIGOLIN, N. e CASTILHO, J. M. &quot;Ferramenta de auxilio para a traduqfio de lingnmgens de especificaggo no desenvolvimento de sistemas de banco de dados&quot;. Simp6sio Brasileiro de Banco de Dados, Campina Grande, 1993.</Paragraph> </Section> class="xml-element"></Paper>