File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/p96-1052_metho.xml
Size: 5,898 bytes
Last Modified: 2025-10-06 14:14:27
<?xml version="1.0" standalone="yes"?> <Paper uid="P96-1052"> <Title>References</Title> <Section position="5" start_page="363" end_page="364" type="metho"> <SectionTitle> (4) \[ADVP --'~ PP , NP\] </SectionTitle> <Paragraph position="0"> Therefore all the patterns were checked against the original corpus to recover the original sentences.</Paragraph> <Paragraph position="1"> The sentences for patterns with low incidence and those whose correctness was questionable were carefully examined to determine whether there was any justification for a particular rule-pattern, given the content of the sentence.</Paragraph> <Paragraph position="2"> For example, the NP:NP:VP rule-pattern was removed since all the verb phrases occurring in this pattern were imperative ones, which can legitimately act as sentences (5). Therefore instances of this rule application were covered by the NP=NP:S rule-pattern. A detailed account of the removal of idiosyncratic, incorrect and exceptional rulepatterns, with justifications, is reported in (Jones, 1996).</Paragraph> <Paragraph position="3"> (5) \[... \] the show's distributor, Viacom Inc, is giving an ultimatum: either sign new long-term commitments to buy future episodes or risk losing &quot;Cosby&quot; to a competitor.</Paragraph> <Paragraph position="4"> After this further pruning procedure, the number of rule-patterns was reduced to just 79, more than half of which related to the comma. It was now possible to postulate some generalisations about the use of the various punctuation marks from this reduced set of rule-patterns.</Paragraph> <Paragraph position="5"> These generalised punctuation rules, described in more detail in (Jones, 1996), are given below for colons (6), semicolons (7), full-stops (8), dashes (9,10), commas (11), basic quotation(12) and stress- null markers (13-15).</Paragraph> <Paragraph position="6"> (6) X=X:{uPISlAOJP} X:{~P,S} (7) S ----- S , S S:{NP, S, VP, PP} (8) T = *.</Paragraph> <Paragraph position="7"> (9) ~ = '~ -- &quot;D -- &quot;~:{NP, S, VP, PP, ADJP} (10) e = e -- { NP I S I VP I PP } -- ~:{NP, S } (II) C = C , * C:{NP, S, VP, PP, ADJP, ADVP} C=,,C (12) Q=&quot;Q&quot; Q:, (13) Z = Z ? Z : * (14) ~ = y ! Y : * (15) W=W... W:* 3 A Theoretical Approach The theoretical starting point is that punctuation seems to occur at a phrasal level, i.e. it comes immediately before or after a phrasal level lexical item (e.g. a noun phrase). However, this is a rather general definition, so we need to examine the problem more exactly.</Paragraph> <Paragraph position="8"> Punctuation could occur adjacent to any complex structure. However, we want to prevent occurrences such as (16). Conversely, punctuation could only occur adjacent to maximal level phrases (e.g. NP, vP). However, this rules out correct cases like (17). (16) The, new toy ...</Paragraph> <Paragraph position="9"> (17) He does, surprisingly, like fish.</Paragraph> <Paragraph position="10"> Clearly we need something stricter than the first approach, but more relaxed than the second. The notion of headedness seems to be involved, so we can postulate that only non-head structures can have punctuation attached. This system still does not rule out examples like (18) however, so further refinement is necessary. The answer seems to be to look at the level of head daughter and mother categories under X-bar theory (Jackendoff, 1977). Attachment of punctuation to the non-head daughter only seems to be legal when mother and head-daughter are of the same bar level (and indeed more often than not they are identical categories), regardless of what that bar level is.</Paragraph> <Paragraph position="11"> (18) the, big, man ~om this theoretical approach it appears that punctuation could be described as being adjunctive (i.e. those phrases to which punctuation is attached serve an adjunctive function). Furthermore, conjunctive uses of punctuation (19,20), conventionally regarded as being distinct from other more grammatical uses (the adjunctive ones), can also be made to function via the theoretical principles formed here.</Paragraph> <Paragraph position="12"> (19) dogs, cats, fish and mice (20) most, or many, examples ...</Paragraph> </Section> <Section position="6" start_page="364" end_page="365" type="metho"> <SectionTitle> 4 Testing -- Work in Progress </SectionTitle> <Paragraph position="0"> The next stage of this research is to test the results of both these approaches to see if they work, and also to compare their results. Since the results of the two studies do not seem incompatible, it should prove possible to combine them, and it will be interesting to see if the results from using the combined approaches differ at all from the results of using the approaches individually. It will also be useful to compare the results with those of studies that have a less formal basis for their treatments of punctuation, e.g. (Briscoe and Carroll, 1995).</Paragraph> <Paragraph position="1"> For this reason the best way to test the results of these approaches to punctuation's role in syntax is to incorporate them into otherwise identical grammars and study the coverage of the grammars in parsing and the quality and accuracy of the parses. For ease of comparison with other studies, the best parsing framework to use will be the Alvey Tools' Grammar Development Environment (GDE) (Carroll et al., 1991), which allows for rapid prototyping and easy analysis of parses. The corpus of sentences to run the grammars over should ideally be large, and consist mainly of real text from external sources. To avoid dealing with idiosyncratic tagging of words, and over-complicated sentences, we shall follow Briscoe and Carroll (1995) rather than Jones (1994) and use 35,000 prepared sentences from the Susanne corpus rather than using the Spoken English Corpus.</Paragraph> </Section> class="xml-element"></Paper>