File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/a00-3004_metho.xml

Size: 6,511 bytes

Last Modified: 2025-10-06 14:07:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-3004">
  <Title>A Weighted Robust Parsing Approach to Semantic Annotation</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
LHIP (Left-corner Head-driven Island Parser) (Bal-
</SectionTitle>
    <Paragraph position="0"> lim and Russell, 1994; Lieske and Ballim, 1998) is a system which performs robust analysis of its input, using a grammar defined in an extended form of the</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="21" type="metho">
    <SectionTitle>
PROLOG Definite Clause Grammars (DCGs). The
</SectionTitle>
    <Paragraph position="0"> chief modifications to the standard PROLOG 'grammar rule' format are of two types: one or more right-hand side (RHS) items may be marked as 'heads' (e.g. using a leading '*'), and one or more RHS items may be marked as 'ignorable' (e.g. using a leading '-'). LHIP employs a different control strategy from that used by PROLOG DCGs, in order to allow it to cope with ungrammatical or unforeseen input* The behavior of LHIP can best be understood in terms of the complementary notions of span and cover* A grammar rule is said to produce an island which spans input terminals ti to ti+,~ if the island starts at the i ~h terminal, and the i + n th terminal is the terminal immediately to the right of the last terminal of the island* A rule is said to cover m items if m terminals are consumed in the span of the rule.</Paragraph>
    <Paragraph position="1">  Thus m &lt; n. If m = n then the rule has completely covered the span. As implied here, rules need not cover all of the input in order to succeed.</Paragraph>
    <Paragraph position="2">  The main goal of introducing weights into LHIP rules is to induce a partial order over the generated hypotheses. The following schema illustrate how to build a simple weighted rule in a compositional fashion where the resulting weight is computed from the sub-constituents using the minimum operator.</Paragraph>
    <Paragraph position="3"> Weights are real numbers in the interval \[0, 1\].</Paragraph>
    <Paragraph position="5"> min_list(\[Wl ..... Wn\] ,Weight)}.</Paragraph>
    <Paragraph position="6"> This strategy is not the only possible since the LHIP formalism allows a greater flexibility. Without entering into formal details we can observe that if we strictly follow the above schema and we impose a cover threshold of 1 we are dealing with fuzzy DCG grammars (Lee and Zadeh, 1969; Asveld, 1996). We actually extend this class of grammars with a notion of fuzzy-robustness where weights are used to compute confidence factors for the membership of islands to categories 3. The order of constituents may play an important role in assigning weights for different rules having the same number and type of constituents. Each LHIP rule returns a weight together with a term'which will contribute to build the resulting structure. The confidence factor for a pre-terminal rule has been assigned statically on the basis of the rule designer's domain knowledge.</Paragraph>
    <Section position="1" start_page="20" end_page="21" type="sub_section">
      <SectionTitle>
2.2 The methodology at work
</SectionTitle>
      <Paragraph position="0"> In our case study we try to integrate the above principles in order to effectively compute annotation hypotheses for the query generation task. This can be done by building a lattice of annotation hypotheses and possibly selecting the best one. This lattice is generated by means of a LHIP weighted grammar which is used to extract and assemble what we called semantic constituents. At the end of this process we presumably obtain suitable annotations from which we will able to extract the content of the query (e.g. name, address, city, etc.). The rules are designed taking into consideration the following kind of knowledge: Domain Knowledge is exploited to provide quantitative support (or confidence factor) to our rules.</Paragraph>
      <Paragraph position="1">  Lexical knowledge: As pointed out in (Basili and M.T., 1997), lexical knowledge plays an important role in Information Extraction since it can contribute in guiding the analysis process at various linguistic level. In our case we are concerned with lexical knowledge when we need to specify lexical LHIP rules which represent the building blocks of our parsing system. Semantic markers are domain-dependent word patterns and must be defined for a given corpus. They identify cue-words serving both as separators among logical subparts of the same sentence and as introducers of semantic constituents. In our specific case they allow us to search for the content of the query only in interesting parts of the sentence. One of the most important separators is the announcement-query separator. The LHIP clauses defining this separator can be one or more words covering rule like for instance: ann_query_separator ( IX\] ,0.7) #I. 0 &amp;quot;'&gt;</Paragraph>
      <Paragraph position="3"> As an example of semantic constituents introducers we propose here the follo~;ing rule:</Paragraph>
      <Paragraph position="5"> preposition (Prep).</Paragraph>
      <Paragraph position="6"> which make use of some word knowledge about street types coming from an external thesaurus like: street_type(X) &amp;quot;'&gt; (c)terminal (X), {thesaurus (street, W),member (X, N) }.</Paragraph>
      <Paragraph position="7"> It should be noted that we mix weighted and non-weighted rules, simply because non-weighted rules are rules with the highest weight 1.</Paragraph>
      <Paragraph position="8">  The generation of annotation hypotheses is performed by: composing weighted rules, assembling constituents and filtering possible hypotheses. In this case the grammar should provide a means to provide an empty constituent when all possible hypothesis rules have failed. The highest level constituent is represented by the whole sentence structure which simply specifies the possible orders of constituents relative to annotation hypotheses.  In the ann rule we have made use of the Kleene closure operator closK which allow LHIP to simply formulate regular expressions. In the query rule we have specified a possible order of constituents interleaved by semantic markers (e.g. separators and introducers). In this case we did not provide any linguistic constraint (e.g. preferring names belonging to the minimal common syntactic sub-tree or those having the longest sequence of proper names belonging to the same sub-tree).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML