File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-3246_concl.xml
Size: 2,607 bytes
Last Modified: 2025-10-06 13:54:31
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3246"> <Title>Learning Hebrew Roots: Machine Learning with Linguistic Constraints</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 6 Conclusions </SectionTitle> <Paragraph position="0"> We have shown that combining machine learning with limited linguistic knowledge can produce state-of-the-art results on a difficult morphological task, the identification of roots of Hebrew words. Our best result, over 80% precision, was obtained using simple classifiers for each of the root's consonants, and then combining the outputs of the classifiers using a linguistically motivated, yet extremely coarse and simplistic, scoring function. This result is comparable to average human performance on this task.</Paragraph> <Paragraph position="1"> This work can be improved in a variety of ways.</Paragraph> <Paragraph position="2"> We intend to spend more effort on feature engineering. As is well-known from other learning tasks, fine-tuning of the feature set can produce additional accuracy; we expect this to be the case in this task, too. In particular, introducing features that capture contextual information is likely to improve the results. Similarly, our scoring function is simplistic and we believe that it can be improved. We also intend to improve the edit-distance function such that the cost of replacing characters reflect phonological and orthographic constraints (Kruskal, 1999).</Paragraph> <Paragraph position="3"> In another track, there are various other ways in which different inter-related classifiers can be combined. Here we only used a simple multiplication of the three classifiers' confidence measures, which is then combined with the linguistically motivated functions. We intend to investigate more sophisticated methods for this combination, including higher-order machine learning techniques.</Paragraph> <Paragraph position="4"> Finally, we plan to extend these results to more complex cases of learning tasks with a large number of targets, in particular such tasks in which the targets are structured. We are currently working on similar experiments for Arabic root extraction. Another example is the case of morphological disambiguation in languages with non-trivial morphology, which can be viewed as a POS tagging problem with a large number of tags on which structure can be imposed using the various morphological and morpho-syntactic features that morphological analyzers produce. We intend to investigate this problem for Hebrew in the future.</Paragraph> </Section> class="xml-element"></Paper>