File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1202_metho.xml

Size: 10,596 bytes

Last Modified: 2025-10-06 14:10:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1202">
  <Title>Measuring MWE Compositionality Using Semantic Annotation</Title>
  <Section position="5" start_page="3" end_page="3" type="metho">
    <SectionTitle>
3 Measuring MWE compositionality
</SectionTitle>
    <Paragraph position="0"> with semantic field information In this section, we propose an algorithm for automatically measuring MWE compositionality based on the Lancaster semantic lexicon. In this lexicon, the semantic field of each word and MWE is encoded in the form of semantic tags.</Paragraph>
    <Paragraph position="1"> We contend that the compositionality of a MWE can be estimated by measuring the distance between semantic fields of an MWE and its constituent words based on the semantic field information available from the lexicon.</Paragraph>
    <Paragraph position="2"> The lexicon employs a taxonomy containing 21 major semantic fields which are further divided into 232 sub-categories.</Paragraph>
    <Paragraph position="3">  Tags are designed to denote the semantic fields using letters and digits. For instance, tag N3.2 denotes the category of {SIZE} and Q4.1 denotes {media: Newspapers}. Each entry in the lexicon maps a word or MWE to its potential semantic field category/ies. More often than not, a lexical item is mapped to multiple semantic categories, reflecting its potential multiple senses. In such cases, the tags are arranged by the order of likelihood of meanings, with the most prominent one at the head of the list. For example, the word &amp;quot;mass&amp;quot; is mapped to tags N5, N3.5, S9, S5 and B2, which denote its potential semantic fields of</Paragraph>
  </Section>
  <Section position="6" start_page="3" end_page="4" type="metho">
    <SectionTitle>
{QUANTITIES}, {MEASUREMENT:
WEIGHT}, {RELIGION AND SUPERNATU-
RAL}, {GROUPS AND AFFILIATION} and
{HEALTH AND DISEASE}.
</SectionTitle>
    <Paragraph position="0"> The lexicon provides direct access to the semantic field information for large number of MWEs and their constituent words. Furthermore, the lexicon was analysed and classified manually by a team of linguists based on the analysis of corpus data and consultation of printed and electronic corpus-based dictionaries, ensuring a high level of consistency and accuracy of the semantic analysis.</Paragraph>
    <Paragraph position="1"> In our context, we interpret the task of measuring the compositionality of MWEs as examining the distance between the semantic tag of a MWE and the semantic tags of its constituent words.</Paragraph>
    <Paragraph position="2">  For the complete semantic tagset, see website:</Paragraph>
    <Paragraph position="4"> ured by multiplying the semantic distance SD between M and each of its constituent words w</Paragraph>
    <Paragraph position="6"> In practice, the square root of the product is used as the score in order to reduce the range of actual D-scores, as shown below:</Paragraph>
    <Paragraph position="8"> where D-score ranges between [0, 1], with 1 indicating the strongest compositionality and 0 the weakest compositionality.</Paragraph>
    <Paragraph position="9"> In the semantic lexicon, as the semantic information of function words is limited, they are classified into a single grammatical bin (denoted by tag Z5). In our algorithm, they are excluded from the measuring process by using a stop word list. Therefore, only the content constituent words are involved in measuring the compositionality. Although function words may form an important part of many MWEs, such as phrasal verbs, because our algorithm solely relies on semantic field information, we assume they can be ignored.</Paragraph>
    <Paragraph position="10"> The semantic distance between a MWE and any of its constituent words is calculated by quantifying the similarity between their semantic field categories. In detail, if the MWE and a constituent word do not share any of the major 21 semantic domains, the SD is assigned a small value l .</Paragraph>
    <Paragraph position="11">  If they do, three possible cases are considered: null Case a. If they share the same tag, and the constituent word has only one tag, then SD is one.</Paragraph>
    <Paragraph position="12"> Case b. If they share a tag or tags, but the constituent words have multiple candidate tags, then SD is weighted using a variable a based on the position of the matched tag in the candidate list as well as the number of candidate tags.</Paragraph>
    <Paragraph position="13"> Case c. If they share a major category, but their tags fall into different sub-categories (denoted by the trailing digits following a letter), SD is further weighted using a  We avoid using zero here in order to avoid producing semantic distance of zero indiscriminately when any one of the constituent words produces zero distance regardless of other constituent words.</Paragraph>
    <Paragraph position="14">  variable b which reflects the difference of the sub-categories.</Paragraph>
    <Paragraph position="15"> With respect to weight a , suppose L is the number of candidate tags of the constituent word under consideration, N is the position of the specific tag in the candidate list (the position starts from the top with N=1), then the weight a is cal- null where N=1, 2, ..., n and N&lt;=L. Ranging between [1, 0), a takes into account both the location of the matched tag in the candidate tag list and the number of candidate tags. This weight penalises the words having more candidate semantic tags by giving a lower value for their higher degree of ambiguity. As either L or N increases, the a value decreases.</Paragraph>
    <Paragraph position="16"> Regarding the case c), where the tags share the same head letter but different digit codes, i.e. they are from the same major category but in different sub-categories, the weight b is calculated based on the number of sub-categories they share. As we mentioned earlier, a semantic tag consists of an initial letter and some trailing digits divided by points, e.g. S1.1.2 {RECIPROC-ITY}, S1.1.3 {PARTICIPATION}, S1.1.4 {DE-SERVE} etc. If we let T  be a pair of semantic tags with the same initial letters, which have k</Paragraph>
    <Paragraph position="18"> trailing digit codes (denoting the number of sub-division layers) respectively and share n digit codes from the left, or from the top layer, then b is calculated as follows:</Paragraph>
    <Paragraph position="20"> where b ranges between (0, 1). In fact, the current USAS taxonomy allows only the maximum three layers of sub-division, therefore b has one of three possible scores: 0.500 (1/2), 0.333 (1/3) and 0.666 (2/3). In order to avoid producing zero scores, if the pair of tags do not share any digit codes except the head letter, then n is given a small value of 0.5.</Paragraph>
    <Paragraph position="21"> Combining all of the weighting scores, the semantic distance SD in formula (1) is calculated as follows:  . then c), if ; then b), if 1; then a), if ; then matches, tagno if</Paragraph>
    <Paragraph position="23"> Some MWEs and single words in the lexicon are assigned with combined semantic categories which are considered to be inseparable, as shown below: petrol_NN1 station_NN1 M3/H1 where the slash means that this MWE falls under the categories of M3 {VEHICLES AND TRANS-</Paragraph>
  </Section>
  <Section position="7" start_page="4" end_page="7" type="metho">
    <SectionTitle>
PORTS ON LAND} and H1 {ARCHITECTURE
AND KINDS OF HOUSES AND BUILDINGS}
</SectionTitle>
    <Paragraph position="0"> at the same time. For such cases, criss-cross comparisons between all possible tag pairs are carried out in order to find the optimal match between the tags of the MWE and its constituent words.</Paragraph>
    <Paragraph position="1"> By way of further explanation, the word &amp;quot;brush&amp;quot; as a verb has candidate semantic tags of B4 {CLEANING AND PERSONAL CARE} and A1.1.1 {GENERAL ACTION, MAKING} etc. On the other hand, the phrasal verb &amp;quot;brush down&amp;quot; may fall under either B4 category with the sense of cleaning or G2.2 category {ETHICS} with the sense of reprimand. When we apply our algorithm to it, we get the D-score of 1.0000 for the sense of cleaning, indicating a high degree of compositionality, whereas we get a low D-score of 0.0032 for the sense of reprimand, indicating a low degree of compositionality. Note that the word &amp;quot;down&amp;quot; in this MWE is filtered out as it is a functional word.</Paragraph>
    <Paragraph position="2"> The above example has only a single constituent content word. In practice, many MWEs have more complex structures than this example. In order to test the performance of our algorithm, we compared its output against human judgments of compositionality, as reported in the following section.</Paragraph>
    <Paragraph position="3">  and asked human raters to rank them via a website. The list includes six MWEs with multiple senses, and these were treated as separate MWE. The Lancaster MWE lexicon has been compiled manually by expert linguists, therefore we assume that every item in this lexicon is a true MWE, although we acknowledge that some errors may exist.</Paragraph>
    <Paragraph position="4"> Following McCarthy et al.'s approach, we asked the human raters to assign each MWE a number ranging between 0 (opaque) and 10 (fully compositional). Both native and non-native speakers are involved, but only the data from native speakers are used in this evaluation. As a result, three groups of raters were involved in the experiment. Group 1 (6 people) rated MWEs with indexes of 1-30, Group 2 (4 people) rated MWEs with indexes of 31-59 and Group 3 (five people) rated MWEs with indexes of 6-89.</Paragraph>
    <Paragraph position="5"> In order to test the level of agreement between the raters, we used the procedures provided in the 'irr' package for R (Gamer, 2005). With this tool, the average intraclass correlation coefficient (ICC) was calculated for each group of raters using a two-way agreement model (Shrout &amp; Fleiss, 1979). As a result, all ICCs exceeded 0.7 and were significant at the 95% confidence level, indicating an acceptable level of agreement between raters. For Group 1, the ICC was 0.894 (95% ci = 0.807 &lt; ICC &lt; 0.948), for Group 2 it was 0.9 (95% ci=0.783&lt;ICC&lt;0.956) and for Group 3 it was 0.886 (95% ci = 0.762 &lt; ICC &lt; 0.948).</Paragraph>
    <Paragraph position="6"> Based on this test, we conclude that the manual ranking of the MWEs is reliable and is suitable to be used in our evaluation. Source data for the human judgements is available from our website in spreadsheet form  .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML