File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/94/h94-1099_abstr.xml
Size: 3,252 bytes
Last Modified: 2025-10-06 13:48:11
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1099"> <Title>Project Goals</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> 3 years. </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Parsing Model </SectionTitle> <Paragraph position="0"> Traditionally, parsing relies on a grammar to determine a set of parse trees for a sentence and typically uses a scoring mechanism based on either rule preference or a probabilistic model to determine a preferred parse. In this conventional approach, a linguist must specify the basic constituents, the rules for combining basic constituents into larger ones, and the detailed conditions under which these rules may be used.</Paragraph> <Paragraph position="1"> Instead of using a grammar, we rely on a probabilistic model, p(TIW), for the probability that a parse tree, T, is a parse for sentence W. We use data from the Treebank, with appropriate statistical modeling techniques, to capture implicitly the plethora of linguistic details necessary to correctly parse most sentences. In our model of parsing, we associate with any parse tree a set of bottom-up derivations; each derivation describing a particular order in which the parse tree is constructed. Our parsing model assigns a probability to a derivation, denoted by p(dlW). The probability of a parse tree is the sum of the probability of all derivations leading to the parse tree. The probability of a derivation is a product of z Co-Principal Investigators: Mark Liberman and Mitchell Marcus probabilities, one for each step of the derivation. These steps are of three types: a tagging step: where we want the probability of tagging a word with a tag in the context of the derivation up to that point.</Paragraph> <Paragraph position="2"> a labeling step: where we want the probability of assigning a non terminal label to a node in the derivation.</Paragraph> <Paragraph position="3"> an extension step: where we want to determine the prob~ ability that a labeled node is extended, for example, Lo the left or right (i.e. to combine with the preceding or following constituents).</Paragraph> <Paragraph position="4"> The probability of a step is determined by a decision tree appropriate to the type of the step. The three decision trees examine the derivation up to that point to determine the probability of any particular step.</Paragraph> <Paragraph position="5"> The parsing models were trained on 28,000 sentences from the Computer Manuals domain, and tested on 1100 unseen sentences of length 1 - 25 words. On this test set, the parser produced the correct parse, i.e. a parse which matched the treebank parse exactly, for 38% of the sentences. Ignoring part-of-speech tagging errors, it produced the correct parse tree for 47% of the sentences.</Paragraph> <Paragraph position="6"> Plans for the Coming Year We plan to continue working with our new parser by completing the following tasks: * implement a set of detailed questions to capture information about conjunction, prepositional attachment, etc. * improve the speed of the search strategy of the parser.</Paragraph> </Section> </Section> class="xml-element"></Paper>