File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-2019_metho.xml

Size: 2,846 bytes

Last Modified: 2025-10-06 14:07:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-2019">
  <Title>Morphological Analysis of The Spontaneous Speech Corpus</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 A Morpheme Model
</SectionTitle>
    <Paragraph position="0"> This section describes a model which estimates how likely a string is to be a morpheme. We implemented this model within an M.E. framework. null Given a tokenized test corpus, the problem of Japanese morphological analysis can be reduced to the problem of assigning one of two tags to each string in a sentence. A string is tagged witha1ora0toindicatewhetherornotitis a morpheme. When a string is a morpheme, a grammatical attribute is assigned to it. The 1 tag is thus divided into the number, n,ofgrammatical attributes assigned to morphemes, and the problem is to assign an attribute (from 0 to n) to every string in a given sentence. The (n + 1) tags form the space of &amp;quot;futures&amp;quot; in the M.E. formulation of our problem of morphological analysis. The M.E. model enables the computation of P(f|h) for any future f from the space of possible futures, F, and for every history, h, from the space of possible histories, H. The computation of P(f|h)inanyM.E.model is dependent on a set of &amp;quot;features&amp;quot; which would be helpful in making a prediction about the future. Like most current M.E. models in computational linguistics, our model is restricted to those features which are binary functions of the history and future. For instance, one of our features is</Paragraph>
    <Paragraph position="2"> Here &amp;quot;has(h,x)&amp;quot; is a binary function that returns true if the history h has feature x.Inour experiments, we focused on such information as whether or not a string is found in a dictionary, the length of the string, what types of characters are used in the string, and what part-of-speech the adjacent morpheme is.</Paragraph>
    <Paragraph position="3"> Given a set of features and some training data, the M.E. estimation process produces a model, which is represented as follows (Berger</Paragraph>
    <Paragraph position="5"> We define a model which estimates the likelihood that a given string is a morpheme and has the grammatical attribute i(1 [?] i [?] n)asa morpheme model. This model is represented by Eq. (2), in which f can be one of (n + 1) tags from 0 to n.</Paragraph>
    <Paragraph position="6"> Given a sentence, it is divided into morphemes, and a grammatical attribute is assigned to each morpheme so as to maximize the sentence probability estimated by our morpheme model. Sentence probability is defined as the product of the probabilities estimated for a particular division of morphemes in a sentence. We use the Viterbi algorithm to find the optimal set of morphemes in a sentence.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML