File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/89/h89-2012_concl.xml
Size: 3,635 bytes
Last Modified: 2025-10-06 13:56:21
<?xml version="1.0" standalone="yes"?> <Paper uid="H89-2012"> <Title>Parsing, Word Associations and Typical Predicate-Argument Relations 1</Title> <Section position="5" start_page="79" end_page="80" type="concl"> <SectionTitle> 9. Conclusion </SectionTitle> <Paragraph position="0"> In any natural language there are restrictions on what words can appear together in the same construction, and in particular, on what can be arguments of what predicates. It is common practice in linguistics to classify words not only on the basis of their meanings but also on the basis of their co-occurrence with other words. Running through the whole Fiahian tradition, for example, is the theme that &quot;You shall know a word by the company it keeps&quot; (Firth, 1957).</Paragraph> <Paragraph position="1"> &quot;On the one hand, bank co-occurs with words and expressions such as money, notes, loan, account, investment, clerk, official, manager, robbery, vaults, working in a, its actions, First National, of England, and so forth. On the other hand, we find bank co-occurring with river, swim, boat, east (and of course West and South, which have acquired special meanings of their own), on top of the, and of the Rhine.&quot; (Hanks 1987, p. 127) Harris (1968) makes this &quot;distributional hypothesis&quot; central to his linguistic theory. His claim is that: &quot;the meaning of entities, and the meaning of grammatical relations among them, is related to the restriction of combinations of these entities relative to other entities,&quot; (Harris 1968:12). Granting that there must be some relationship between distribution and meaning, the exact nature of such a relationship to our received notions of meaning is nevertheless not without its complications. For example, there are some purely collocational restrictions in English that seem to enforce no semantic distinction. Thus, one can roast chicken and peanuts in an oven, but typically fish and beans are baked rather than roasted: this fact seems to be a quirk of the history of English. Polysemy provides a second kind of complication. A sentence can be parsed and a sentence can be commuted, but these are two distinct senses of the word sentence; we should not be misled into positing a class of things that can be both parsed and commuted.</Paragraph> <Paragraph position="2"> 3. Much of the work on language modeling for speech recognition has tended to concentrate on search questions. Should we still be using Bates' island driving approach (Bates 1975), or should we try something newer such as Tomita's so-called generalized LR(k) parser (Tomita 1986)7 We suggest that the discussion should concentrate more on describing the facts, and less on how they are enforced.</Paragraph> <Paragraph position="3"> Given these complicating factors, it is by no means obvious that the distribution of words will directly provide a useful semantic classification, at least in the absence of considerable human intervention. The work that has been done based on Harris' distributional hypothesis (most notably, the work of the associates of the Linguistic String Project (see for example, Hirschman, Grishman, and Sager 1975)) unfortunately does not provide a direct answer, since the corpora used have been small (tens of thousands of words rather than millions) and the analysis has typically involved considerable intervention by the researchers. However, with much larger corpora (10-100 million words) and robust parsers and taggers, the early results reported here and elsewhere appear extremely promising.</Paragraph> </Section> class="xml-element"></Paper>