File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/i05-2043_metho.xml
Size: 9,786 bytes
Last Modified: 2025-10-06 14:09:36
<?xml version="1.0" standalone="yes"?> <Paper uid="I05-2043"> <Title>Trend Survey on Japanese Natural Language Processing Studies over the Last Decade</Title> <Section position="3" start_page="250" end_page="253" type="metho"> <SectionTitle> 3 Trend survey on research organizations </SectionTitle> <Paragraph position="0"> Next, we investigated the change in the number of papers put out by each research organization.</Paragraph> <Paragraph position="1"> The results are represented in contour in Figures 2 and 3. The height in contour (the depth of a black color) indicates the number of papers. We calculated the average (we call it average value) of the average, the mode, and the median of the published years by using the data of the number of papers performed by each research organization. In the figures, each research organization is listed in ascending order of the average value. We added the total number of papers and the average value to each research organization in the figures.</Paragraph> <Paragraph position="2"> Therefore, research organizations that had many papers in the earlier years are displayed higher on the list, while research organizations that had pers by each research organization many papers in the later years are displayed lower. Here, we displayed only research organizations that had many total papers. If a research organization's name was changed during the ten-year period, we used the name that had the most usage on published papers for displaying it.</Paragraph> <Paragraph position="3"> From these figures, we can see that ATR and CRL (NICT) put out many journal papers, and NTT, ATR, Tokyo Institute of Technology, CRL, and the University of Tokyo put out many conference papers. We also found that while NTT and ATR had many papers in the earlier years, CRL and the Univ. of Tokyo had many papers in the later years. We can expect that because CRL and the Univ. of Tokyo demonstrate an upward tendency, their quantity of papers will continue to increase in the future. Using these figures, we can see very easily in which reference year each research organization put out many papers.</Paragraph> <Paragraph position="4"> 4 Trend survey on research areas Next, we investigated the change in the number of papers in each research area. The results are in Figures 4, 5, and 6. (Because the volume of data for conference papers was large, it was divided into two figures.). For journal papers, the height When we counted the frequency of a research organization whose name was changed, we used all the names of it including old and new names.</Paragraph> <Paragraph position="5"> in each research area in contour indicates the number of papers. For conference papers, the height in contour indicates the base two logarithm of the number of papers added by one. Using the same method as that described above, we calculated the average of the average, mode, and median of the years papers were published using the data of the number of papers in each research area. In the figures, each research area is displayed in ascending order of the average value. We added the total number of papers and the average value to each research area in the figures. Here, we divided the title of each paper into words using ChaSen software (Matsumoto et al., 1999), and we treaded each word as a research area. A paper with a particular word in pers in each research area (part I) its title was categorized in the research area indicated by the word. We manually eliminated words that were not indicative of a research area, for example, &quot;teki&quot; (of) and &quot;kenkyu&quot; (study). From these figures, it is clear that the research areas of &quot;Japanese&quot; and &quot;analysis&quot; were studied in an especially large number of papers. We also found that for journal papers, because the research areas of &quot;verb&quot;, &quot;noun&quot;, &quot;disambiguation&quot;, &quot;probability&quot;, &quot;corpus&quot;, and &quot;polysemic&quot; were displayed higher on the list, these areas were studied thoroughly in the earlier years. Likewise, we found that the research areas of &quot;morphology&quot;, &quot;dependency&quot;, &quot;dialogue&quot;, and &quot;speech&quot; were studied thoroughly in the sixth year and the pers in each research area (part II) research areas of &quot;summarization&quot;, &quot;retrieval&quot;, &quot;translation&quot; and so on were studied well in the later years. Special journal issues on &quot;summarization&quot; were published in the sixth and ninth years, so the research area of &quot;summarization&quot; was represented in many papers in those years. We can expect that because the research area of &quot;translation&quot; demonstrates an upward tendency, the number of papers on this topic will continue to increase in the future.</Paragraph> <Paragraph position="6"> In terms of conference papers, we found that the research areas of &quot;bilingual&quot;, &quot;morphology&quot;, &quot;probability&quot;, &quot;dictionary&quot;, &quot;statistics&quot;, and so on were studied well in the earlier years. In the lower part of the figures, such research areas as &quot;re- null trieval&quot;, &quot;summarization&quot;, &quot;question&quot; and &quot;paraphrase&quot; are found. Thus, we can see that these research areas were studied thoroughly in recent years. We can see very easily in which reference years each research area was studied using these figures.</Paragraph> <Paragraph position="7"> 5 Trend survey using part of data Although we have focused on using all the data in the trend survey so far, we can narrow down the survey by looking only at a certain part of the data. For example, when we want to exam- null of each research organization is given a &quot; &quot; symbol.) ine a trend survey on translation in more detail, all we have to do is to extract papers on translation and use them for a trend survey. We carried out a trend survey on machine translation in this manner. We first extracted papers whose titles included the word &quot;translation&quot; and then performed the same investigations as in Sections 3 and 4.</Paragraph> <Paragraph position="8"> The results are in Figures 7 and 8. The height in contour (the depth of a color) indicates the number of papers. From Figure 7, we can see that NTT had many papers in the earlier years, and ATR had many papers in later years. From Figure 7, we can also see that studies on translation often dealt with specific topics such as &quot;semantics&quot;, &quot;knowledge&quot; and &quot;dictionary&quot; in earlier years and &quot;support&quot;, &quot;example&quot;, and &quot;retrieval&quot; in more recent years.</Paragraph> </Section> <Section position="4" start_page="253" end_page="254" type="metho"> <SectionTitle> 6 Relationship between research </SectionTitle> <Paragraph position="0"> organizations and research topics Finally, we investigated the various research areas that research organizations studied more frequently during the ten-year period. Here, we show only the results for journal papers. We used the same method as in the previous sections for extracting research organizations and research areas from the data. We counted the cooccurrent frequency of each research organization and each research area. We then constructed a cross table in this manner and then performed the dual scaling method (Weller and Romney, 1990; Ueda et al., 2003). The result is depicted in Figure 9. The dual scaling method displays the relationship between research organizations and research areas.</Paragraph> <Paragraph position="1"> In Figure 9, &quot;translation&quot; appears in the lower left quadrant, &quot;learning&quot; appears in the lower right quadrant, &quot;statistics&quot; and &quot;retrieval&quot; appear in the upper right quadrant, and &quot;noun&quot; and &quot;sentence&quot; appear in the upper left quadrant. In the vicinity around these words, the research areas and organizations relating to them appear. For example, in the upper right quadrant, Hitachi and University of Tokushima appear near &quot;statistics&quot; and &quot;retrieval&quot;, which were frequent study topics for them. Similarly, &quot;summarization&quot; appears in the near upper right area of the source origin and is surrounded by JAIST, Toyohashi University of Technology, and Tokyo Institute of Technology., indicating it was a frequent topic of study at those institutions. We can easily see which research topics were primarily studied by each organization using this figure.</Paragraph> <Paragraph position="2"> Also in Figure 9, research areas on numericals such as &quot;probability&quot; and &quot;learning&quot; appear on the right side. Therefore, we can interpret the figure as depicting quantitative research topics on the right side and qualitative research topics on the left side. Research areas using complicated processing such as &quot;learning&quot; and &quot;translation&quot; appear in the lower area and research areas dealing with theory such as &quot;probability&quot;, &quot;grammar&quot;, &quot;sentence&quot;, and &quot;noun&quot; appear in the upper area. Therefore, we can interpret the figure as depicting theoretical research topics in the upper area and research topics using complicated processing in the lower area.</Paragraph> </Section> class="xml-element"></Paper>