File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/e06-2021_concl.xml

Size: 2,637 bytes

Last Modified: 2025-10-06 13:55:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-2021">
  <Title>Automatic Acronym Recognition</Title>
  <Section position="6" start_page="169" end_page="169" type="concl">
    <SectionTitle>
5 Conclusions
</SectionTitle>
    <Paragraph position="0"> The approach presented in this paper relies on already existing acronym pairs which are seen in different Swedish texts. The rule-based algorithm utilizes predefined strong constraints to find and extract acronym-definition pairs with different patterns, it has the advantage of recognizing acronyms and definitions which are not indicated by parentheses. The recognized pairs were used to test and compare several machine learning algorithms. This approach does not requires manual tagging of the training data.</Paragraph>
    <Paragraph position="1"> The results given by the rule-based algorithm are as good as reported from earlier experiments that have dealt with the same task for the English language. The algorithm uses backward search algorithm and to increase recall it is necessary to combine it with forward search algorithm.</Paragraph>
    <Paragraph position="2"> The variety of the Swedish acronym pairs is large and includes structures which are hard to detect, for example: &lt;&amp;quot;V F&amp;quot;,&amp;quot;kammarflimmer&amp;quot;&gt; and &lt;&amp;quot;CT&amp;quot;,&amp;quot;datortomografi&amp;quot;&gt; , the acronym is in English while the extension is written in Swedish. These structures require a dictionary/database lookup3, especially because there are also counter examples in the Swedish text where both the acronym and the definition are in English. Another problematic structure is three letter acronyms which consist of only lowercase letterssincetherearemanyprepositions,verbsand determinates that correspond to this structure. To solve this problem it may be suitable to combine textual pre-processing such as part-of-speech annotation or/and parsing with the exiting code.</Paragraph>
    <Paragraph position="3"> The machine learning experiment shows that the best results were given by the IGTREE algorithm4. Performance can further improve by modifying the input settings e.g test different feature weighting schemes, such as Shared Variance and  pressed decision tree structure.</Paragraph>
    <Paragraph position="4"> Gain Ratio and combine different values of k for the k-nearest neighbour classifier5.</Paragraph>
    <Paragraph position="5"> On-going work aim to improve the rule-based method and combine it with a supervised machine learning algorithm. The model produced will later be used for making prediction on a new data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML