Relation Analysis among Clusters based on the Miss-Classifiers

Research Project

Project/Area Number	19K12110
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Tokyo Metropolitan College of Industrial Technology
Principal Investigator	Yokoi Takeru 東京都立産業技術高等専門学校, ものづくり工学科, 准教授 (40469573)
Project Period (FY)	2019-04-01 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000) Fiscal Year 2021: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2020: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2019: ¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000)
Keywords	テキストマイニング / データマイニング / 自然言語処理 / 関連性分析 / 関連性抽出 / 関係性抽出 / トピック分析 / クラスタリング
Outline of Research at the Start	本研究課題では、クラスタ間の関連性を表現する対象の集合を抽出し、その関連性を分析するための新たな枠組みを構築することを目的とする。本研究課題では、クラスタリング結果において「誤分類された対象」が、クラスタ間の関連性を表現していると考え、その誤分類された対象に基づいて、クラスタ間の関連性を分析するための枠組みの構築を目指す。
Outline of Final Research Achievements	This study has proposed a novel framework for analying the relations among the clusters based on the miss-clustered subjects. In order to achieve the purpose, two main research questions have been addressed. One is what is the relation among the clusters identified by the miss-clustered subjects. The other is how to measure the degree of relation among the clusters identified by the miss-clustered subjects. As a result of our research, regarding the 1st research question, the framework of factor analysis for the relations was constructed and the analysis of the relations was conducted. Regarding the second research question, some measures of the degree of the relations among clusters were established, i.e., based on the number of miss-clustered subjects, the probability of miss-clustering, and the focus on the proper nouns. The experiments were carried out on news articles and the usefulness of the results was verified.
Academic Significance and Societal Importance of the Research Achievements	昨今の情報爆発時代の到来を受け、有用な情報を効率よく選別するための情報選別技術の重要性はますます増している。その中の代表的な技術のひとつに、似た対象を自動的に集約する「クラスタリング」がある。クラスタリングされた結果において有用と考えられる情報は大きく分けて2種類存在すると考えられる。ひとつは、そのクラスタを代表する対象の集合、もうひとつは、クラスタ間の関連性を表現する対象の集合である。本研究課題は、この「クラスタリング」という簡便な方法を用いて、後者のクラスタ間の関連性を表現する対象の集合を抽出し、その関連性を分析するための新たな枠組みを構築を目指した点において、有意であると考えられる。