File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3002_intro.xml
Size: 3,606 bytes
Last Modified: 2025-10-06 14:02:45
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3002"> <Title>Hybrid Statistical and Structural Semantic Modeling for Thai Multi- Stage Spoken Language Understanding</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Related Works </SectionTitle> <Paragraph position="0"> In the technology of trainable or data-driven SLU, two different practices for different applications have been widely investigated. The first practice aims to tag the words (or group of words) in the utterance with semantic labels, which are later converted to a certain format of semantic representation. To generate such a semantic frame, words in the utterance are usually aligned to a semantic tree by a parsing algorithm such as a probabilistic context free grammar or a recursive network whose nodes represent semantic symbols of the words and arcs consist of transition probabilities.</Paragraph> <Paragraph position="1"> During parsing, these probabilities are summed up, and used to determine the most likely parsed tree.</Paragraph> <Paragraph position="2"> Many understanding engines have been successfully implemented based on this paradigm (Seneff, 1992; Potamianos et al., 2000; Miller et al., 1994). A drawback of this method is, however, the requirement of a large, fully annotated corpus, i.e. a corpus with semantic tags on every word, to ensure training reliability. The second practice has been utilized in applications such as call classification (Gorin et al., 1997). In this application, the understanding module aims to classify an input utterance to one of predefined user goals (if an utterance is supposed to have one goal) directly from the words contained in the utterance.</Paragraph> <Paragraph position="3"> This problem can be considered a simple pattern classification task. An advantage of this method is the need for training utterances tagged only with their goals, one for each utterance. However, another process is required if one needs to obtain more detailed information. Our motivation for combining the two practices described above is that this allows the use of an only partially annotated corpus, while still allowing the system to capture sufficient information. The idea of combination has also been investigated in other works such as Wang et al. (2002).</Paragraph> <Paragraph position="4"> Another issue related to this article is the combination of a statistical and rule-based approach for SLU, a system which is expected to improve the overall performance over both individual approaches. The closest approach to our work was proposed by Bechet et al.</Paragraph> <Paragraph position="5"> (2002), aiming to extract named-entities (NEs) from an input utterance. NE extraction is performed in two steps, detecting the NEs by a statistical tagger and extracting NE values using local models. Esteve et al.</Paragraph> <Paragraph position="6"> (2003) proposed a tighter coupling method that embeds conceptual structures into the ASR decoding network. Wang et al. (2000), and Hacioglu and Ward (2001) proposed similar ideas for unified models that incorporated domain-specific context-free grammars (CFGs) into domain-independent n-gram models. The hybrid models thus improved the generalized ability of the CFG and specificity of the n-gram. With the existing regular grammar model in a weighted finite-state transducer (WFST) framework, we propose another strategy to incorporate the statistical n-gram model into the concept extraction and concept-value recognition components of our multi-stage SLU.</Paragraph> </Section> class="xml-element"></Paper>