File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/c96-1033_metho.xml

Size: 12,387 bytes

Last Modified: 2025-10-06 14:14:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="C96-1033">
  <Title>FeasPar - A Feature Structure Parser Learning to Parse Spoken Language</Title>
  <Section position="4" start_page="188" end_page="188" type="metho">
    <SectionTitle>
2 Feature Structures
</SectionTitle>
    <Paragraph position="0"> Feature structures(Gazdar et al., 1985; Pollard and Sag, 1987) are used as output fbrmalism for  l,basPar. Their core. syntactic properties and terminology are: 1. A feature structure is a set of none, one or several feature pairs.</Paragraph>
    <Paragraph position="1"> 2. A featurepair, e.g. (frame *clarify), consists of a feature, e.g. frame or topic, and a feature value.</Paragraph>
    <Paragraph position="2"> 3. A feature value is either: (a) an atomic value, e.g. *clarify (b) a complex value 4. A complex value is a feature structure.</Paragraph>
  </Section>
  <Section position="5" start_page="188" end_page="189" type="metho">
    <SectionTitle>
3 The Chunk'n'Label Principle
</SectionTitle>
    <Paragraph position="0"> In contrast to tim standard feature structure detinition of Section 2, an alternative view-point is to look at a feature structure as a tree 1, where sets tThis assumes that structure sharing is not possible, see Section 3.1.2.</Paragraph>
    <Paragraph position="1">  monday i assume you mean monday the twenty seventh&amp;quot; null of feature pairs with atomic wdues make up tile braimhes, and the ln'anches are connected with relations. Atomic feature pairs belonging to the same branches, have the same relation to all other branches. Further, when comparing the sentence with its feature structure, it appears that there is a correspondence between fl'agments of the feature structure, and specific ctmnks of the sentence. In the example feature structure of Figure 1, the following observations about feature pairs and relations apply: * feature pairs: \[feature pairs: corresponds to: ~-( (f rame *simple-time) \[(day-of-week monday) &amp;quot;monday the L_ (day 27)) twenty seventh&amp;quot; &amp;quot;the twenty seventh&amp;quot; * relations: tile coinplex value of the tbature topic corresponds to the chunk &amp;quot;by monday&amp;quot;, and tile complex value of the feature clarified corresponds to &amp;quot;you mean monday the twenty seventh&amp;quot;.</Paragraph>
    <Paragraph position="2"> Manually aligniug the sentence with fragments of the feature structure, gives a structure as shown in Figure 2. A few coinments apply to this figure: * The sentence is hierarchically split into chunks.</Paragraph>
    <Paragraph position="3"> * Feature pairs are listed with their corresponding chunk.</Paragraph>
    <Paragraph position="4"> * Relations are shown in square brackets, and express how a chunk relates to its parent chunk. Relations may contain more than one element. This allows several nesting levels. Once having obtained the information in Figure 2, producing a feature structure is straight forward, using the algorithm of Figure 3. Sumruing up, we can define this procedure as the chunk'n'label principle of parsing:  1. Split the incoming sentence into hierarchical chunks.</Paragraph>
    <Paragraph position="5"> 2. Label each chuck with feature pairs and feature relations.</Paragraph>
    <Paragraph position="6"> 3. Convert this into a feature structure, using the algorithm of Figure 3.</Paragraph>
    <Paragraph position="8"> feature structure</Paragraph>
    <Section position="1" start_page="189" end_page="189" type="sub_section">
      <SectionTitle>
3.1 Theoretical Limitations
</SectionTitle>
      <Paragraph position="0"> The chunk'n'label principle has a few theoretical limitations compared with the feature structure formalisms commonly used in unification-based parsing, e.g. (Gazdar et al., 1985).</Paragraph>
      <Paragraph position="1">  With the chunk'n'label principle, the feature structure has a maximum nesting depth. One could expect the maximal nesting depth to cause limitations. However, these limitations are only theoretical, because very deep nesting is hardly needed in practice for spoken language. Due to the ability to model relations of more than length 1, no nesting depth problems occurred while modeling over 600 sentences from the English Spontaneous Scheduling Task (ESST).</Paragraph>
      <Paragraph position="2">  Many unification formalisms allow feature values to be shared. The chunk'n'label principle does not incorporate any mechanism for this. However, all work with ESST and ILT empirically showed that there is no need for structure sharing. This observation suggests that for semantic analysls, structure sharing is statistically insignificant, even if its existence is theoretically present.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="189" end_page="191" type="metho">
    <SectionTitle>
4 Baseline Parser
</SectionTitle>
    <Paragraph position="0"> The chunk'n'label principle is the basis for the design and implementation of the FeasPar parser.</Paragraph>
    <Paragraph position="1"> FeasPar uses neural networks to learn to produce chunk parses. It has two modes: learn mode and run mode. In learn mode, manually modeled chunk parses are split into several separate training sets; one per neural network. Then, the networks are trained independently of each other, allowing for parallel training on several CPU's. In run mode, the input sentence is processed through all networks, giving a chunk parse, which is passed</Paragraph>
    <Paragraph position="3"> on to the converting algorithm shown in Figure 3.</Paragraph>
    <Paragraph position="4"> In the following, tile three main modules required to produce a chunk parse are described: The Chunker splits an input sentence into chunks. It consists of three neural networks. The first network finds numbers. They are classified as being ordinal or cardinal numbers, and are presented as words to the following networks. The next network groups words together to phrases.</Paragraph>
    <Paragraph position="5"> The third network groups phrases together into clauses. In total, there are four levels of chunks: word/numbers, phrases, clauses and sentence.</Paragraph>
    <Paragraph position="6"> The Linguistic Feature Labeler attaches features and atomic feature values (if applicable) to these chunks. For each feature, there is a network, which finds one or zero atomic values. Since there are many features, each chunk may get no, one or several pairs of features and atomic values. Since a feature normally only occurs at a certain ctmnk level, the network is tailored to decide on a particular feature at a particular chunk level. This specialization is there to prevent the learning task from becoming too complex. A special atomic feature value is called lexical feature value. It is indicated by '=' and means that the neural network only detects the occurrence of a value, whereas the value itself is found by a lexicon lookup. The lexical feature values are a true hybrid mechanism, where symbolic knowledge is included when the neural network signals so. Furthermore, features may be marked as up-features (e.g.../incl-excl in Figure 4 and 5). An up-feature is propagated up to its parent branch when building the feature structure (see Figure 6).</Paragraph>
    <Paragraph position="7"> The Chunk Relation Finder determines how a chunk relates to its parent chunk. It has one network per chunk level and chunk relation element.</Paragraph>
    <Paragraph position="8"> The following example illustrates in detail how the three parts work. \]~br clarity, this example assumes that all networks perform perfectly. The parser gets the English sentence: &amp;quot;i have a meeting till twelve&amp;quot; The Chunker segments the sentence before passing it to the Linguistic Feature Labeler, which adds semantic labels (see Figure 4). The Chunk Relation Finder then adds relations, where appropriate, and we get the chunk parse as shown in Figure 5. Finally, processing it by the algorithm in Figure 3, gives the final parse, the feature structure, as shown in Figure 6.</Paragraph>
    <Section position="1" start_page="190" end_page="191" type="sub_section">
      <SectionTitle>
4.1 Lexicon
</SectionTitle>
      <Paragraph position="0"> FeasPar uses a full word form lexicon. The lexicon consists of three parts: one, a syntactic and semantic microfeature vector per word, second, lexical feature values, and three, statistical microfeatures. null Syntactic and semantic microfeatures are represented for each word as a vector of binary vahles. These vectors are used as input to the neural networks. As the neural networks learn their tasks based on the microfeatures, and not based on distinct words, adding new words using the same microfeatures is easy and does not degrade general- null ization performance. The number and selection of microfeatures are domain dependent and must be made manually. For ESST, the lexicon contains domain independent syntactic and domain dependent semantic microfcatures. To manually model a 600 word ESST vocabulary requires 3 lull days.</Paragraph>
      <Paragraph position="1"> Lexical feature values are stored in look-up tables, which are accessed when the Linguistic Feature Labeler indicates a lexical feature value.</Paragraph>
      <Paragraph position="2"> These tables are generated automatically from the training data, and can easily be extended by hand for more generality and new words. An automatie ambiguity checker warns if similar words or phrases map to ambiguous lexical feature values.</Paragraph>
      <Paragraph position="3"> Statistical microfeatures are represented for each word as a vector of continuous values Vstat.</Paragraph>
      <Paragraph position="4"> These microfeatures, each of them representing a feature pair, are extracted automatically. For every feature value at a certain chunk level, if there exists a word such that, given this word in the training data, the feature value occurs in more than 50 % of tim cases. One continuous microfeature value v~t,t for a word w is set automatically to the percentage of feature value occurrence given that word w.</Paragraph>
    </Section>
    <Section position="2" start_page="191" end_page="191" type="sub_section">
      <SectionTitle>
4.2 Neural Architecture and Training
</SectionTitle>
      <Paragraph position="0"> All neural networks have one hidden layer, and are conventional feed-forward networks. The learning is done with standard back-propagation, com~ bined with the constructive learning algorithm PCL(Jain, 1991), where learning starts using a small context, which is increased later in the learning process. This causes local dependencies to be learned first.</Paragraph>
      <Paragraph position="1"> Generalization performance is increased by sparse connectivity. This connection principle is based on the microfeatures in the lexicon that are relevant to a particular network. The Chunker networks are only connected to the syntactic microfeatures, because chunking is a syntactic task. With ESST, the Linguistic Feature Labeler and Chunk Relation Finder networks are connected only to the semantic microfeatures, and to relevant statistical microfeatures. All connectivity setup is automatic. Further techniques for improving performance are described in (Buo, 1996).</Paragraph>
      <Paragraph position="2"> For the neural networks, the average test set performance is 95.4 %</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="191" end_page="191" type="metho">
    <SectionTitle>
5 Search
</SectionTitle>
    <Paragraph position="0"> The complete parse depends on many neural networks. Most networks have a certain error rate; only a few networks are perfect. When building complete feature structures, these network errors multiply up, resulting in not only that many feature structures are erroneous, but also inconsistent and making no sense.</Paragraph>
    <Paragraph position="1"> To compensate for this, we wrote a search algorithm. It's based on two information sources: First, scores that originates from the network output activations; second, a formal feature structure specification, stating what mixture of feature pairs are consistent. This specification was already available as an interlingua specification document. null Using these two information sources, the search finds the feature structure with the highest score, under the constraint of being consistent. The search is described in more detail in (Bu0 and Waibel, 1996; Bu0, 1996).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML