File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/p04-3025_evalu.xml
Size: 3,542 bytes
Last Modified: 2025-10-06 13:59:15
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-3025"> <Title>Incorporating topic information into sentiment analysis models</Title> <Section position="7" start_page="0" end_page="0" type="evalu"> <SectionTitle> 5 Results </SectionTitle> <Paragraph position="0"> Experimental results may be seen in figure 1. It must be noted that this dataset is very small,and although the results are not conclusive they are promising insofar as they suggest that the use of incorporating PMI values towards the topic yields some improvement in modeling. They also suggest that the best way to incorporate such features is in the form of a separate SVM which may then be combined with the lemma-based model to create a hybrid.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 Discussion </SectionTitle> <Paragraph position="0"> At the level of the phrasal SO assignment, it would seem that some improvement could be gained by adding domain context to the AltaVista Search. Many--perhaps most--terms' favorability content depends to some extent on their context. As Turney notes, &quot;unpredictable,&quot; is generally positive when describing a movie plot, and negative when describing an automobile or a politician. Likewise, such terms as &quot;devastating&quot; might be generally negative, but in the context of music or art may imply an emotional engagement which is usually seen as positive. Likewise, using &quot;excellent&quot; and &quot;poor&quot; as the poles in assessing this value seems somewhat arbitrary, especially given the potentially misleading economic meaning of &quot;poor.&quot; Nevertheless, cursory experiments in adjusting the search have not yielded improvements. One problem with limiting the domain (such as adding &quot;AND music&quot; or some disjunction of such constraints to the query) is that the resultant hit count is greatly diminished. The data sparseness which results from added restrictions appears to cancel out any potential gain. It is to be hoped that in the future, as search engines continue to improve and the Internet continues to grow, more possibilities will open up in this regard. As it is, Google returns more hits than AltaVista, but its query syntax lacks a &quot;NEAR&quot; operator, making it unsuitable for this task. As to why using &quot;excellent&quot; and &quot;poor&quot; works better than, for example &quot;good&quot; and &quot;bad,&quot; it is not entirely clear. Again, cursory investigations have thus far supported Turney's conclusion that the former are the appropriate terms to use for this task.</Paragraph> <Paragraph position="1"> It also seems likely that the topic-relations aspect of the present research only scratches the surface of what should be possible. Although performance in the mid-80s is not bad, there is still considerable room for improvement. The present models may also be further expanded with features representing other information sources, which may include other types of semantic annotation (Wiebe, 2002; Wiebe et al., 2002), or features based on more sophisticated grammatical or dependency relations, or perhaps upon such things as zoning (e.g. do opinions become more clearly stated towards the end of a text?). In any case, it is hoped that the present work may help to indicate how various information sources pertinent to the task may be brought together.</Paragraph> </Section> </Section> class="xml-element"></Paper>