研究課題/領域番号 |
研究機関 | 早稲田大学 |
研究代表者 |
渡辺 耕平 早稲田大学, 高等研究所, その他(招聘研究員) (50832466)
研究分担者 |
多湖 淳 早稲田大学, 政治経済学術院, 教授 (80457035)
研究期間 (年度) |
2019-04-01 – 2022-03-31
キーワード | content analysis / conference / software development |
研究実績の概要 |
We have made significant contribution to the development of quantitative text analysis in Japan through our research project about North Korean and Iranian nuclear programs and organization of the first international conference on the methodology and its social science applications in Asia. We have established a methodology for multi-lingual analysis, created lexical resources in Japanese, Hebrew and English, and improved handling of Unicode characters by open-source software programs as part of our research project in FY2019. The methodological development made quantitative text analysis in Asian languages accessible to political scientists without technological expertise. The international conference on quantitative text analysis (POLTEXT) held in Tokyo attracted one hundred international participants. It introduced Japanese political scientists to the methodology through participation in tutorial events and cutting-edge research presentation. We expect that many of the Japanese participants at the conference will employ the methodology in their future research projects, and further promote the methodology in Japan and other Asian countries. We have already written a research paper on latent semantic scaling (LSS), and a book chapter on semantic network analysis. These pieces will be published in peer-reviewed journal and a book, along with upcoming two substantive papers on the content analysis and survey experiment in the coming years.
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
We made a good progress in our research project in FY2019. Although we rescheduled paper writing because the ECPR conference was canceled due to the COV19 pandemic, other tasks are largely on schedule. We have completed collecting news articles on Iran and North Korea from Japanese, Hebrew and English newspapers over 10 years. We collected over a hundred thousands full-text articles in total. This is an important mile stone in the project as data collection is usually the most difficult task in quantitative text analysis. We have also created software tools to extract texts and meta information from the collected articles. We have created stop words not only in Japanese and Hebrew but also in Chinese and Arabic as part of the project. The stop words list is already made publicly available to allow other researchers to embark on quantitative analysis of Asian-language texts. We also have identified seed words for security threat in both Japanese and Hebrew. The seed words will be published in the research paper on the content analysis of newspapers. The software tools that we employ in the large-scale quantitative analysis has been improved. We improved how these software tools handle Unicode characters to make them useful in analysis of Asian-languages texts. All of those tools are distributed as R packages and they are becoming increasingly popular among political scientists in different world regions. The analysis pipeline has been constructed programmatically using the software tools and already produced very interesting results that support our hypotheses.
今後の研究の推進方策 |
We are currently creating datasets by employing human coders to validate the results of quantitative text analysis. Manual coding is completed in Japanese and on-going in Hebrew. We expect that the manual coding of Hebrew articles to be completed in June. Once the manual coding is completed, we will rerun the analysis pipeline and produce final results to write a research paper on the content analysis of newspapers. We have already started designing servery experiment that we will conduct this year in both Japan and Israel. We lost a Israeli research collaborator for her personal reasons but recruited a person to replace her in survey experiment. A research paper that combine the content analysis and the survey experiment will be produce next year.