XML Viewer - w06-1618

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1618_metho.xml
Size: 19,136 bytes
Last Modified: 2025-10-06 14:10:47
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1618">
  <Title>Sydney, July 2006. c(c)2006 Association for Computational Linguistics Identification of Event Mentions and their Semantic Class</Title>
  <Section position="5" start_page="146" end_page="146" type="metho">
    <SectionTitle>
3 Events in the TimeBank
</SectionTitle>
    <Paragraph position="0"> TimeBank (Pustejovsky, et. al. 2003b) consists of just under 200 documents containing 70,000 words; it is drawn from news texts from a variety of different domains, including newswire and transcribed broadcast news. These documents are annotated using the TimeML annotation scheme (Pustejovsky, et. al. 2003a), which aims to identify not just times and dates, but events and the temporal relations between these events.</Paragraph>
    <Paragraph position="1"> Of interest here are the EVENT annotations, of which TimeBank 1.1 has annotated 8312.</Paragraph>
    <Paragraph position="2"> TimeBank annotates a word or phrase as an EVENT if it describes a situation that can &amp;quot;happen&amp;quot; or &amp;quot;occur&amp;quot;, or if it describes a &amp;quot;state&amp;quot; or &amp;quot;circumstance&amp;quot; that &amp;quot;participate[s] in an opposition structure in a given text&amp;quot; (Pustejovsky, et. al. 2003b). Note that the TimeBank events are not restricted to verbs; nouns and adjectives denote events as well.</Paragraph>
    <Paragraph position="3"> The TimeBank definition of event differs in a few ways from the traditional linguistic definition of event. TimeBank EVENTs include not only the normal linguistic events, but also some linguistic states, depending on the contexts in which they occur. For example1, in the sentence None of the people on board the airbus survived the crash the phrase on board would be considered to describe an EVENT because that state changes in the time span covered by the text. Not all linguistic states become TimeBank EVENTs in this manner, however. For example, the state described by New York is on the east coast holds true for a time span much longer than the typical newswire document and would therefore not be labeled as an EVENT.</Paragraph>
    <Paragraph position="4"> In addition to identifying which words in the TimeBank are EVENTs, the TimeBank also provides a semantic class label for each EVENT.</Paragraph>
    <Paragraph position="5"> The possible labels include OCCURRENCE, PERCEPTION, REPORTING, ASPECTUAL, STATE, I_STATE, I_ACTION, and MODAL, and are described in more detail in (Pustejovsky, et. al. 2003a).</Paragraph>
    <Paragraph position="6"> We consider two tasks on this data:  (1) Identifying which words and phrases are EVENTs, and (2) Identifying their semantic classes.</Paragraph>
    <Paragraph position="7">  The next section describes how we turn these tasks into machine learning problems.</Paragraph>
  </Section>
  <Section position="6" start_page="146" end_page="147" type="metho">
    <SectionTitle>
4 Event Identification as Classification
</SectionTitle>
    <Paragraph position="0"> We view event identification as a classification task using a word-chunking paradigm similar to that used by Carreras et. al. (2002). For each word in a document, we assign a label indicating whether the word is inside or outside of an event.</Paragraph>
    <Paragraph position="1"> We use the standard B-I-O formulation of the word-chunking task that augments each class label with an indicator of whether the given word  is (B)eginning, (I)nside or (O)utside of a chunk (Ramshaw &amp; Marcus, 1995). So, for example, under this scheme, sentence (1) would have its words labeled as in Table 1.</Paragraph>
    <Paragraph position="2">  (1) The company's sales force [EVENT(I_ACTION) applauded] the [EVENT(OCCURRENCE) shake up]  The two columns of labels in Table 1 show how the class labels differ depending on our task. If we're interested only in the simple event identification task, it's sufficient to know that applauded and shake both begin events (and so have the label B), up is inside an event (and so has the label I), and all other words are outside events (and so have the label O). These labels are shown in the column labeled Event Label. If in addition to identifying events, we also want to identify their semantic classes, then we need to know that applauded begins an intentional action event (B_I_ACTION), shake begins an occurrence event (B_OCCURRENCE), up is inside an occurrence event (I_OCCURRENCE), and all other words are outside of events (O). These labels are shown in the column labeled Event Semantic Class Label. Note that while the eight semantic class labels in the TimeBank could potentially introduce as many as 8 * 2 + 1 = 17 chunk labels, not all types of events appear as multi-word phrases, so we see only 13 of these labels in our data.</Paragraph>
  </Section>
  <Section position="7" start_page="147" end_page="149" type="metho">
    <SectionTitle>
5 Classifier Features
</SectionTitle>
    <Paragraph position="0"> Having cast the problem as a chunking task, our next step is to select and represent a useful set of features. In our case, since each classification instance is a word, our features need to provide the information that we deem important for recognizing whether a word is part of an event or not. We consider a number of such features, grouped into feature classes for the purposes of discussion.</Paragraph>
    <Section position="1" start_page="147" end_page="147" type="sub_section">
      <SectionTitle>
5.1 Text feature
</SectionTitle>
      <Paragraph position="0"> This feature is just the textual string for the word.</Paragraph>
    </Section>
    <Section position="2" start_page="147" end_page="147" type="sub_section">
      <SectionTitle>
5.2 Affix features
</SectionTitle>
      <Paragraph position="0"> These features attempt to isolate the potentially important subsequences of characters in the word. These are intended to identify affixes that have a preference for different types of events.</Paragraph>
      <Paragraph position="1"> Affixes: These features identify the first three and four characters of the word, and the last three and four characters of the word.</Paragraph>
      <Paragraph position="2"> Nominalization suffix: This feature indicates which of the suffixes typically associated with nominalizations - ing(s), ion(s), ment(s), and nce(s) - the word ends with. This overlaps with the Suffixes feature, but allows the classifier to more easily treat nominalizations specially.</Paragraph>
    </Section>
    <Section position="3" start_page="147" end_page="147" type="sub_section">
      <SectionTitle>
5.3 Morphological features
</SectionTitle>
      <Paragraph position="0"> These features identify the various morphological variants of a word, so that, for example, the words resist, resisted and resistance can all be identified as the same basic event type.</Paragraph>
      <Paragraph position="1"> Morphological stem: This feature gives the base form of the word, so for example, the stem of assisted is assist and the stem of investigations is investigation. Stems are identified with a lookup table from the University of Pennsylvania of around 300,000 words.</Paragraph>
      <Paragraph position="2"> Root verb: This feature gives the verb from which the word is derived. For example, assistance is derived from assist and investigation is derived from investigate. Root verbs are identified with an in-house lookup table of around 5000 nominalizations.</Paragraph>
    </Section>
    <Section position="4" start_page="147" end_page="148" type="sub_section">
      <SectionTitle>
5.4 Word class features
</SectionTitle>
      <Paragraph position="0"> These features attempt to group the words into different types of classes. The intention here is to identify correlations between classes of words and classes of events, e.g. that events are more likely to be expressed as verbs or in verb phrases than they are as nouns.</Paragraph>
      <Paragraph position="1"> Part-of-speech: This feature contains the word's part-of-speech based on the Penn Treebank tag set. Part-of-speech tags are assigned by the MX- null Syntactic-chunk label: The value of this feature is a B-I-O style label indicating what kind of syntactic chunk the word is contained in, e.g.</Paragraph>
      <Paragraph position="2"> noun phrase, verb phrase, or prepositional phrase. These are assigned using a word-chunking SVM-based system trained on the CoNLL-2000 data2 (which uses the lowest nodes of the Penn TreeBank syntactic trees to break sentences into base phrases).</Paragraph>
      <Paragraph position="3"> Word cluster: This feature indicates which verb or noun cluster the word is a member of. The clusters were derived from the co-occurrence statistics of verbs and their direct objects, in the same manner as Pradhan et. al. (2004). This produced 128 clusters (half verbs, half nouns) covering around 100,000 words.</Paragraph>
    </Section>
    <Section position="5" start_page="148" end_page="148" type="sub_section">
      <SectionTitle>
5.5 Governing features
</SectionTitle>
      <Paragraph position="0"> These features attempt to include some simple dependency information from the surrounding words, using the dependency parses produced by Minipar3. These features aim to identify events that are expressed as phrases or that require knowledge of the surrounding phrase to be identified. null Governing light verb: This feature indicates which, if any, of the light verbs be, have, get, give, make, put, and take governs the word. This is intended to capture adjectival predicates such as may be ready, and nominal predicates such as make an offer, where ready and offer should be identified as events.</Paragraph>
      <Paragraph position="1"> Determiner type: This feature indicates the type of determiner a noun phrase has. If the noun phrase has an explicit determiner, e.g. a, the or some, the value of this feature is the determiner itself. We use the determiners themselves as feature values here because they form a small, closed class of words. For open-class determiner-like modifiers, we instead group them into classes. For noun phrases that are explicitly quantified, like a million dollars, the value is CARDINAL, while for noun phrases modified by other possessive noun phrases, like Bush's real objectives, the value is GENITIVE. For noun phrases without a determiner-like modifier, the value is PROPER_NOUN, BARE_PLURAL or BARE_SINGULAR, depending on the noun type.</Paragraph>
      <Paragraph position="2">  Subject determiner type: This feature indicates for a verb the determiner type (as above) of its subject. This is intended to distinguish generic sentences like Cats have fur from non-generics like The cat has fur.</Paragraph>
    </Section>
    <Section position="6" start_page="148" end_page="148" type="sub_section">
      <SectionTitle>
5.6 Temporal features
</SectionTitle>
      <Paragraph position="0"> These features try to identify temporal relations between words. Since the duration of a situation is at the core of the TimeBank definition of events, features that can get at such information are particularly relevant.</Paragraph>
      <Paragraph position="1"> Time chunk label: The value of this feature is a B-I-O label indicating whether or not this word is contained in a temporal annotation. The temporal annotations are produced by a word-chunking SVM-based system trained on the temporal expressions (TIMEX2 annotations) in the TERN 2004 data4. In addition to identifying expressions like Monday and this year, the TERN data identifies event-containing expressions like the time she arrived at her doctor's office.</Paragraph>
      <Paragraph position="2"> Governing temporal: This feature indicates which kind of temporal preposition governs the word. Since the TimeBank is particularly interested in which events start or end within the time span of the document, we consider prepositions likely to indicate such a change of state, including after, before, during, following, since, till, until and while.</Paragraph>
      <Paragraph position="3"> Modifying temporal: This feature indicates which kind of temporal expression modifies the word. Temporal expressions are recognized as above, and the type of modification is either the preposition that joins the temporal annotation to the word, or ADVERBIAL for any nonpreposition modification. This is intended to capture that modifying temporal expressions often indicate event times, e.g. He ran the race in an hour.</Paragraph>
    </Section>
    <Section position="7" start_page="148" end_page="148" type="sub_section">
      <SectionTitle>
5.7 Negation feature
</SectionTitle>
      <Paragraph position="0"> This feature indicates which negative particle, e.g. not, never, etc., modifies the word. The idea is based Siegel and McKeown's (2000) findings which suggested that in some corpora states occur more freely with negation than events do.</Paragraph>
    </Section>
    <Section position="8" start_page="148" end_page="149" type="sub_section">
      <SectionTitle>
5.8 WordNet hypernym features
</SectionTitle>
      <Paragraph position="0"> These features indicate to which of the WordNet noun and verb sub-hierarchies the word belongs.</Paragraph>
      <Paragraph position="1">  Rather than include all of the thousands of different sub-hierarchies in WordNet, we first selected the most useful candidates by looking at the overlap with WordNet and our training data. For each hierarchy in WordNet, we considered a classifier that labeled all words in that hierarchy as events, and all words outside of that hierarchy as non-events5. We then evaluated these classifiers on our training data, and selected the ten with the highest F-measures. This resulted in selecting the following synsets:  * noun: state * noun: psychological feature * noun: event * verb: think, cogitate, cerebrate * verb: move, displace * noun: group, grouping * verb: act, move * noun: act, human action, human activity * noun: abstraction * noun: entity  The values of the features were then whether or not the word fell into the hierarchy defined by each one of these roots. Note that since there are no WordNet senses labeled in our data, we accept a word as falling into one of the above hierarchies if any of its senses fall into that hierarchy. null</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="149" end_page="149" type="metho">
    <SectionTitle>
6 Classifier Parameters
</SectionTitle>
    <Paragraph position="0"> The features described in the previous section give us a way to provide the learning algorithm with the necessary information to make a classification decision. The next step is to convert our training data into sets of features, and feed these classification instances to the learning algorithm.</Paragraph>
    <Paragraph position="1"> For the learning task, we use the TinySVM6 support vector machine (SVM) implementation in conjunction with YamCha7 (Kudo &amp; Matsumoto, 2001), a suite for general-purpose chunking.</Paragraph>
    <Paragraph position="2"> YamCha has a number of parameters that define how it learns. The first of these is the window width of the &amp;quot;sliding window&amp;quot; that it uses.  A sliding window is a way of including some of the context when the classification decision is made for a word. This is done by including the features of preceding and following words in addition to the features of the word to be classified. To illustrate this, we consider our earlier example, now augmented with some additional features in Table 2.</Paragraph>
    <Paragraph position="3"> To classify up in this scenario, we now look not only at its features, but at the features of some of the neighboring words. For example, if our window width was 1, the feature values we would use for classification would be those in the outlined box, that is, the features of shake, up and the sentence final period. Note that we do not include the classification labels for either up or the period since neither of these classifications is available at the time we try to classify up. Using such a sliding window allows YamCha to include important information, like that up is preceded by shake and that shake was identified as beginning an event.</Paragraph>
    <Paragraph position="4"> In addition to the window width parameter, YamCha also requires values for the following three parameters: the penalty for misclassification (C), the kernel's polynomial degree, and the method for applying binary classifiers to our multi-class problem, either pair-wise or one-vsrest. In our experiments, we chose a one-vs-rest multi-class scheme to keep training time down, and then tried different variations of all the other parameters to explore a variety of models.</Paragraph>
  </Section>
  <Section position="9" start_page="149" end_page="150" type="metho">
    <SectionTitle>
7 Baseline Models
</SectionTitle>
    <Paragraph position="0"> To be able to meaningfully evaluate the models we train, we needed to establish a reasonable baseline. Because the majority class baseline would simply label every word as a non-event, we introduce two baseline models that should be more reasonable: Memorize and Sim-Evita.</Paragraph>
    <Paragraph position="1">  Word POS Stem Label The DT the O company NN company O 's POS 's O sales NNS sale O force NN force O applauded VBD applaud B The DT the O shake NN shake B up RP up . . .</Paragraph>
    <Paragraph position="2">  The Memorize baseline is essentially a lookup table - it memorizes the training data. This system assigns to each word the label with which it occurred most frequently in the training data, or the label O (not an event) if the word never occurred in the training data. The Sim-Evita model is our attempt to simulate the Evita system (Sauri et. al. 2005). As part of its algorithm, Evita includes a check that determines whether or not a word occurs as an event in TimeBank. It performs this check even when evaluated on TimeBank, and thus though Evita reports 74% precision and 87% recall, these numbers are artificially inflated because the system was trained and tested on the same corpus. Thus we cannot directly compare our results to theirs. Instead, we simulate Evita by taking the information that it encodes as rules, and encoding this instead as features which we provide to a YamCha-based system.</Paragraph>
    <Paragraph position="3"> Sauri et. al. (2005) provides a description of Evita's rules, which, according to the text, are based on information from lexical stems, part of speech tags, syntactic chunks, weak stative predicates, copular verbs, complements of copular predicates, verbs with bare plural subjects and WordNet ancestors. We decided that the following features most fully covered the same information: null  We also decided that since Evita does not consider a word-window around the word to be classified, we should set our window size parameter to zero.</Paragraph>
    <Paragraph position="4"> Because our approximation of Evita uses a feature-based statistical machine learning algorithm instead of the rule-based Evita algorithm, it cannot predict how well Evita would perform if it had not used the same data for training and testing. However, it can give us an approximation of how well a model can perform using information similar to that of Evita.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML