XML Viewer - w00-0708

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/w00-0708_metho.xml
Size: 6,142 bytes
Last Modified: 2025-10-06 14:07:23
<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0708">
  <Title>Memory-Based Learning for Article Generation</Title>
  <Section position="6" start_page="43" end_page="44" type="metho">
    <SectionTitle>
3 Features Determining Automated
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="43" end_page="44" type="sub_section">
      <SectionTitle>
Article Generation
</SectionTitle>
      <Paragraph position="0"> We have extracted 300K base noun phrases (NPs) from the Penn Treebank Wall Street Journal data (Bies et al., 1995) using the tgrep tool. The distribution of these NP instances with respect to articles is as follows: the 20.6%, a/an 9.4% and 70.0% with no article.</Paragraph>
      <Paragraph position="1"> We experimented with a range of features:  1. Head of the NP: We consider as the head of the NP the rightmost noun in the NP. If an NP does not contain a noun, we take the last word in the NP as its head.</Paragraph>
      <Paragraph position="2"> 2. Part-of-speech (PoS) tag of the head of the NP: PoS labels were taken from the Penn Treebank. We list the tags that occurred with</Paragraph>
      <Paragraph position="4"> the heads of theNPs in Table 1.</Paragraph>
      <Paragraph position="5"> PoS Tag the alan no  Street Journal data (300,744 NPs in all) 3. Functional tag of the head of the NP: In the Penn Treebank each syntactic category can be associated with up to four functional tags as listed in Table 2. We consider the sequence of functional tags associated with the category of the NP as a feature; if a constituent has no functional tag, we give the feature the value NONE.  4. Category of the constituent embedding the NP: We looked at the category of the embedding constituent. See Figure 1: The category of the constituent embedding the NP the problem is PP.</Paragraph>
      <Paragraph position="6"> 5. Functional tag of the constituent embedding the NP: If the category of the constituent embedding the NP is associated with one or more functional tags, they are used as features. The functional tag of the constituent embedding the problem in Figure 1 is DIR.</Paragraph>
      <Paragraph position="7"> 6. Other determiners of the NP: We looked  at the presence of a determiner in the NP. By definition, an NP in the Penn Treebank can only  have one determiner (Bies et al., 1995), so we expect it to be a good predictor of situations where we should not generate an article. 7. Head countability preferences of the head of the NP: In case the head of an NP is a noun we also use its countability as a feature. We anticipate that this is a useful feature because singular indefinite countable nouns normally take the article a/n, whereas singular indefinite uncountable nouns normally take no article: a dog vs water. We looked up the countability from the transfer lexicon used in the Japanese-to-English machine translation system ALT-J/E (Ikehara et al., 1991). We used six values for the countability feature: FC (fully countable) for nouns that have both singular and plural forms and can be directly modified by numerals and modifiers such as many; UC (uncountable) for nouns that have no plural form and can be modified by much; SC (strongly countable) for nouns that are more often countable than uncountable; WC (weakly countable) for nouns that are more often uncountable than countable; and PT (pluralia tantum) for nouns that only have plural forms, such as for example, scissors (Bond et al., 1994). Finally, we used the value UNKNOWN if the lexicon did not provide countability information for a noun or if the head of the NP was not a noun. 41.4% of the NP instances received the value UNKNOWN for this feature.</Paragraph>
      <Paragraph position="8"> 8. Semantic classes of the head of the NP: If the head of the NP is a noun we also take into account its semantic classification in a large semantic hierarchy. The underlying idea is that the semantic class of the noun can be used as a way to back off in case of unknown head nouns. The 2,710 node semantic hierarchy we used was also developed in the context of the ALT-J/E system (Ikehara et al., 1991). Edges in this hierarchy represent IS-A or HAS-A relationships. In case the semantic classes associated with two nodes stand in the IS-A relation, the semantic class associated with the node highest in the hierarchy subsumes the semantic class associated with the other node.</Paragraph>
      <Paragraph position="9"> Each of the nodes in this part of the hierarchy is represented by a boolean feature which is set to 1 if that node lies on the path from the root of the hierarchy to a particular semantic class. Thus, for example, the semantic features of a noun in the semantic class organization consists of a vector of 30 features where the features corresponding to the nodes noun, concrete, agent and organization are set to I and all other features are set to 0. 2</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="44" end_page="45" type="metho">
    <SectionTitle>
4 Memory-based learning
</SectionTitle>
    <Paragraph position="0"> We used the Tilburg memory based learner TiMBL 3.0.1 (Daelemans et al., 2000) to learn from examples for generating articles using the features discussed above. Memory-based learning reads all training instances into memory and classifies test instances by extrapolating a class from the most similar instance(s) in memory.</Paragraph>
    <Paragraph position="1"> Daelemans et al. (1999) have shown that for typical natural language tasks, this approach has the advantage that it also extrapolates from exceptional and low-frequency instances. In addition, as a result of automatically weighing features in the similarity function used to determine the class of a test instance, it allows the user to incorporate large 2If a noun has multiple senses, we collapse them by taking the semantic classes of a noun to be the union of the semantic classes of all its senses.</Paragraph>
    <Paragraph position="2">  numbers of features from heterogeneous sources: When data is sparse, feature weighing embodies a smoothing-by-similarity effect (Zavrel and Daelemans, 1997).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML