File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-2018_metho.xml

Size: 5,796 bytes

Last Modified: 2025-10-06 14:09:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2018">
  <Title>Detecting the Countability of English Compound Nouns Using Web-based Models</Title>
  <Section position="3" start_page="103" end_page="104" type="metho">
    <SectionTitle>
2 Our approach
</SectionTitle>
    <Paragraph position="0"> We classified compound nouns into three classes, countable, uncountable and plural only. In Baldwin and Bond (2003), they classified individual nouns into four possible classes. Besides the classes mentioned above, they also considered bipartite nouns. These words can only be plural when they head a noun phrase (trousers), but singular when used as a modifier (trouser leg). We did not take this class into account in the paper, for the bipartite words is very few in compound nouns.</Paragraph>
    <Paragraph position="1">  For plural only compound noun, we assume that the frequency of the word occurrence in the plural form is much larger than that in the singular form, while for the uncountable noun, the frequency in the singular form is much larger than that in the plural form. The main processing flow is shown in Figure 1. In the figure, &amp;quot;Cnoun&amp;quot; and &amp;quot;Ns&amp;quot; mean compound noun and the plural form of the word respectively.</Paragraph>
    <Paragraph position="2"> &amp;quot;F(Ns)&gt;&gt;F(N)&amp;quot; means that the frequency of the plural form of the noun is much larger than that of the singular form.</Paragraph>
    <Paragraph position="4"> Our approach for detecting countability is based on some simple unsupervised models.</Paragraph>
    <Paragraph position="6"> In (1), we use the frequency of a word in the plural form against that in the singular form. th is the detecting threshold above which the word can be considered as a plural only.</Paragraph>
    <Paragraph position="7">  In (2), we use the frequency of a word in the singular form co-occurring with the determiner &amp;quot;much&amp;quot; against the frequency of the word in the plural form with many, if above th , the word can be considered as uncountable word. (2) is used to distinguish between countable and uncountable compound nouns.</Paragraph>
    <Paragraph position="9"> The model 3 that compares the frequencies of noun-be pairs (eg,. f(&amp;quot;account books are&amp;quot;), f(&amp;quot;account book is&amp;quot;) is used to distinguish plural only and countable compound nouns.</Paragraph>
    <Paragraph position="10"> With the help of WWW search engine Google, the frequencies (web hits) in the models can be obtained using quoted n-gram queries (&amp;quot;soft surroundings&amp;quot;). Although in Keller and Lapata (2002), they experimentally showed that web-based approach can overcome data sparseness for bigrams, but the problem still exists in our experiments. When the number of pages found is zero, we smooth zero hits by adding them to 0.01.</Paragraph>
    <Paragraph position="11"> Countable compound nouns create some problems when we need to pluralize them. For no real rules exist for how to pluralize all the words, we summarized from &amp;quot;Guide to English Grammar and Writing&amp;quot; for some trends. We processed our experimental data following the  rules below.</Paragraph>
    <Paragraph position="12"> 1. Pluralize the last word of the compound noun. Eg,. bedrooms, film stars.</Paragraph>
    <Paragraph position="13"> 2. When &amp;quot;woman&amp;quot; or &amp;quot;man&amp;quot; are the modifiers in the compound noun, pluralize both of the words. Eg,. Women-drivers.</Paragraph>
    <Paragraph position="14"> 3. When the compound noun is made up as &amp;quot;noun + preposition (or prep. phrase)&amp;quot;, pluralize the noun. Eg,. fathers-in-law.</Paragraph>
    <Paragraph position="15"> 4. When the compound noun is made up as &amp;quot;verb (or past participle) + adverb&amp;quot;, pluralize the last word. Eg,. grown-ups,  stand-bys.</Paragraph>
    <Paragraph position="16"> Although the rules cannot adapt for each compound noun, in our experimental data, all the countable compound nouns follow the rules. We are sure that the rules are viable for most countable compound nouns.</Paragraph>
    <Paragraph position="17"> Although we used Google as our search engine, we did not use Google Web API service for programme realization, for Google limits to 1000 automated queries per day. As we just need web hits returned for each search query, we extracted the numbers of hits from the web pages found directly.</Paragraph>
  </Section>
  <Section position="4" start_page="104" end_page="105" type="metho">
    <SectionTitle>
3 Experimental Data
</SectionTitle>
    <Paragraph position="0"> The main experimental data is from Webster's New International Dictionary (Second Edition).</Paragraph>
    <Paragraph position="1"> The list of compound words of the dictionary is available in the Internet  . We selected the compound words randomly from the list and keep the nouns, for the word list also mixes compound verbs and adjectives with nouns together. We repeated the process several times until got our experimental data. We collected 3000 words for the training which is prepared for optimizing the detecting thresholds, and 500 words for the test set which is used to evaluate our approach. In the sets we added 180 newcoined compound nouns (150 for training; 30 for test). These relatively new words that were created over the past seven years have not yet reached any dictionaries null  We manually annotated the countability of these compound nouns, plural only, countable, uncountable. An English teacher who is a native speaker has checked and corrected the annotations. The make-up of the experimental data is listed in Table 1.</Paragraph>
    <Paragraph position="2">  The compound word list is available from http://www. puzzlers.org/wordlists/dictinfo.php.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML