File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-2018_intro.xml

Size: 6,180 bytes

Last Modified: 2025-10-06 14:02:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2018">
  <Title>Detecting the Countability of English Compound Nouns Using Web-based Models</Title>
  <Section position="2" start_page="0" end_page="103" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In English, a noun can be countable or uncountable. Countable nouns can be &amp;quot;counted&amp;quot;, they have a singular and plural form. For example: an apple, two apples, three apples. Uncountable nouns cannot be counted. This means they have only a singular form, such as water, rice, wine.</Paragraph>
    <Paragraph position="1"> Countability is the semantic property that determines whether a noun can occur in singular and plural forms. We can obtain the information about countability of individual nouns easily from grammar books or dictionaries. Several researchers have explored automatically learning the countability of English nouns (Bond and Vatikiotis-Bateson, 2002; Schwartz, 2002; Baldwin and Bond, 2003). However, all the proposed approaches focused on learning the countability of individual nouns.</Paragraph>
    <Paragraph position="2"> A compound noun is a noun that is made up of two or more words. Most compound nouns in English are formed by nouns modified by other nouns or adjectives. In this paper, we concentrate solely on compound nouns made up of only two words, as they account for the vast majority of compound nouns. There are three forms of compound words: the closed form, in which the words are melded together, such as &amp;quot;songwriter&amp;quot;, &amp;quot;softball&amp;quot;, &amp;quot;scoreboard&amp;quot;; the hyphenated form, such as &amp;quot;daughter-in-law&amp;quot;, &amp;quot;master-at-arms&amp;quot;; and the open form, such as &amp;quot;post office&amp;quot;, &amp;quot;real estate&amp;quot;, &amp;quot;middle class&amp;quot;.</Paragraph>
    <Paragraph position="3"> Compound words create special problems when we need to know their countability. According to &amp;quot;Guide to English Grammar and Writing&amp;quot;, the base element within the compound noun will generally function as a regular noun for the countability, such as &amp;quot;Bedrooms&amp;quot;. However this rule is highly irregular. Some uncountable nouns occur in their plural forms within compound nouns, such as &amp;quot;mineral waters&amp;quot; (water is usually considered as uncountable noun).</Paragraph>
    <Paragraph position="4"> The countability of some words changes when occur in different compound nouns. &amp;quot;Rag&amp;quot; is countable noun, while &amp;quot;kentish rag&amp;quot; is uncountable; &amp;quot;glad rags&amp;quot; is plural only. &amp;quot;Wages&amp;quot; is plural only, but &amp;quot;absolute wage&amp;quot; and &amp;quot;standard wage&amp;quot; are countable. So it is obvious that determining countability of a compound noun should take all its elements into account, not consider solely on the base word.</Paragraph>
    <Paragraph position="5"> The number of compound nouns is so large that it is impossible to collect all of them in one  dictionary, which also need to be updated frequently, for newcoined words are being created continuously, and most of them are compound nouns, such as &amp;quot;leisure sickness&amp;quot;, &amp;quot;Green famine&amp;quot;. null Knowledge of countability of compound nouns is very important in English text generation. The research is motivated by our project: post-edit translation candidates in machine translation. In Baldwin and Bond (2003), they also mentioned that many languages, such as Chinese and Japanese, do not mark countability, so how to determine the appropriate form of translation candidates is depend on the knowledge of countability. For example, the correct translation for &amp;quot;Fa Yu Xing Tong  &amp;quot; is &amp;quot;growing pains&amp;quot;, not &amp;quot;growing pain&amp;quot;.</Paragraph>
    <Paragraph position="6"> In this paper, we learn the countability of English compound nouns using WWW as a large corpus. For many compound nouns, especially the relatively new words, such as genetic pollution, have not yet reached any dictionaries.</Paragraph>
    <Paragraph position="7"> we believe that using the web-scale data can be a viable alternative to avoid the sparseness problem from smaller corpora. We classified compound nouns into three classes: countable (eg., bedroom), uncountable (eg,. cash money), plural only (eg,. crocodile tears). To detect which class a compound noun is, we proposed some simple, viable n-gram models, such as freq(N) (the frequency of the singular form of the noun) whose parameters' values (web hits of literal queries) can be obtained with the help of WWW search engine Google. The detecting thresholds (a noun whose value of parameter is above the threshold is considered as plural only) are estimated on the small countability-tagged training set. Finally we evaluated our detecting approach on a test set and showed that our algorithm based on the simple models performed the promising results.</Paragraph>
    <Paragraph position="8"> Querying in WWW adds noise to the data, we certainly lose some precision compared to supervised statistical models, but we assume that the size of the WWW will compensate the rough queries. Keller and Lapata (2003) also showed the evidence of the reliability of the web counts for natural language processing. In (Lapata and Keller, 2005), they also investigated the countability leaning task for nouns. However they  &amp;quot;Fa Yu Xing Tong &amp;quot;(fa yu xing tong) which is Chinese compound noun means &amp;quot;growing pains&amp;quot;.</Paragraph>
    <Paragraph position="9"> only distinguish between countable and uncountable for individual nouns. The best model is the determiner-noun model, which achieves 88.62% on countable and 91.53% on uncountable nouns.</Paragraph>
    <Paragraph position="10"> In section 2 of the paper, we describe The main approach used in the paper. The preparation of the training and test data is introduced in section 3. The details of the experiments and results are presented in section 4. Finally, in section 5 we list our conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML