File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/h93-1044_metho.xml
Size: 7,971 bytes
Last Modified: 2025-10-06 14:13:25
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1044"> <Title>Session 8: Statistical Language Modeling</Title> <Section position="2" start_page="0" end_page="225" type="metho"> <SectionTitle> 2. Part of Speech Tagging </SectionTitle> <Paragraph position="0"> The first paper in this session, by Matsukawa, Miller and Weischedel, describes a cascade of several components, sandwiching a novel algorithm between the output of an existing black-box segmentation and POS labelling systern for Japanese, JUMAN, and the POST HMM POS tagger. The middle algorithm uses what the authors call example-based correction to change some of JUMAN's initial word segmentation and to add alternative POS tags from which POST can then make a final selection.</Paragraph> <Paragraph position="1"> (Japanese text is printed without spaces; determining where one word stops and another starts is a crucial problem in Japanese text processing.) The example-based correction method, closely related to a method presented by Brill at this workshop last year, uses a very small amount of training data to learn a set of symbolic transformation rules which augment or change the output of JUMAN in particular deterministic contexts.</Paragraph> <Paragraph position="2"> 3. Gralnrnar Induction and Probabilistic</Paragraph> <Section position="1" start_page="0" end_page="225" type="sub_section"> <SectionTitle> Parsing </SectionTitle> <Paragraph position="0"> Most current methods for probabilistic parsing either estimate grammar rule probabilities directly from an annotated corpus or else use Baker's Inside/Outside algorithm (often in combination with some annotation) to estimate the parameters from an unannotated corpus.</Paragraph> <Paragraph position="1"> The 2/0 algorithm, however, maximizes the wrong objective function for purposes of recovering the expected grammatical structure for a given sentence; the 2/0 algorithm finds the model that maximizes the likelihood of the observed sentence strings without reference to the grammatical structure assigned to that string by the estimated gramnaar. Often, however, probabilistic parsing is used to derive a tree structure for use with a semantic analysis component based upon syntax directed translation; for this translation to work effectively, the details of the parse tree must be appropriate for tree-based semantic composition techniques. Current techniques are also inapplicable to the recently developed class of chunk parsers, parsers which use finite-state techniques to parse the non-recursive structures of the language, and then use another technique, usually related to dependency parsing, to connect these chunks together. Two papers in this session can be viewed as addressing one or both of these issues. The paper by Abney presents a new measure for evaluating parser performance tied directly to grammatical structure, and suggests ways in which such a measure can be used for chunk parsing. Brill presents a new technique for parsing which extends the symbolic POS tagger he presented last year. Surprisingly, this simple technique performs as well as the best recent results using the I/O algorithm, using a very simple technique to learn less than two hundred purely symbolic rules which deterministically parse new input.</Paragraph> <Paragraph position="2"> 4. Lexical semantics: Sense class determination The remaining papers in this session address three separate areas of lexical semantics. The first is sense class determination, determining, for example, whether a particular use of the word &quot;newspaper&quot; refers to the physical entity that sits by your front door in the morning, or the corporate entity that publishes it; whether a particular use of &quot;line&quot; means a product line, a queue, a line of text, a fishing line, etc. Several papers in this session address the question of how well automatic statistical techniques can discriminate between alternative word senses, and how much information such techniques must use. The paper by Leacock, Miller and Voorhees tests three different techniques for sense class determination: Bayesian decision theory, neural networks, and content vectors. These experiments show that the three techniques are statistically indistinguishable, each resolving between three different uses of &quot;line&quot; with an accuracy of about 76%, and between six different uses with an accuracy of about 73%. These techniques use an extended context of about, 100 words around the target word; Yarowsky's paper presents a new technique which uses only five words on either side of the target word, but can provide roughly comparable results by itself. This new method might well be combined with one of these earlier techniques to provide improved performance over either technique individually.</Paragraph> <Paragraph position="3"> 5. Lexical semantics: adjectival scales A second area of lexical semnantics focuses on the semantics of adjectives that determine linguistic scales. For example, one set of adjectives lie on the linguistic scale from hot through warm and cool to cold, while another set lies on the scale that goes fi'om huge through big to little to tiny. Many adjectives can be characterizing as picking out a point or range on some such scale. These scales play a role in human language understanding because of a phenomenon called scalar implicature, which underlies the fact that if someone asks if Tokyo is a big city, much better than replying &quot;yes&quot; is to say, &quot;Well, no; it's actually quite huge&quot;. By the law of scalar implicature, one cannot felicitously assent to an assertion about a midpoint on a scale even if it is logically true, if an assertion about an extremum is also logically true. McKeown and Hatzivassiloglou take a first step toward using statistical techniques to automatically determine where adjectives fall along such scales by presenting a method which automatically clusters adjectives into groups which are closely related to such scales.</Paragraph> </Section> </Section> <Section position="3" start_page="225" end_page="226" type="metho"> <SectionTitle> 6. Lexical semantics: Selectional Restrictions </SectionTitle> <Paragraph position="0"> Another key aspect of lexical semantics is the determination of the selectional constraints of verbs; determining for each sense of any given verb what kinds of entities can serve as the subject for a given verb sense, and what kinds of entities can serve as objects. For example, for one meaning of open, the thing opened is most likely to be an entrance; for another meaning, a mouth; for another, a container; fbr another, a discourse. One key barrier to determining such selectional constraints automaritally is a serious problem with sparse data; in a large corpus, a given verb is likely to occur with any particular noun as object in only a handful of instances. Two papers in this session automatically derive selectional restrictions, each with a different solution to this particular form of the sparse data problem. The paper by Resnik utilizes an information theoretic technique to automatically determine such selectional restrictions; this information is then used to resolve a number of syntactic anabiguities that any parser must deal with. Resnik uses the noun is-a network within Miller's WordNet to provide sufficiently large classes to obtain reliable results.</Paragraph> <Paragraph position="1"> Grishman and Sterling attack the problem of sparse data by using co-occurance smoothing on a set. of fully automatically generated selectional constraints.</Paragraph> <Paragraph position="2"> In one last paper in lexical semantics, Matsukawa presents a new naethod of determining word associations in Japanese text. Such word associations are useful in dealing with parsing ambiguities and should also prove useful for Japanese word segmentation.</Paragraph> </Section> class="xml-element"></Paper>