2016 Fiscal Year Final Research Report

Tiny data mining: reconstruction of large scale data with probability distributions as bases

Research Project

PDF

Project/Area Number	26330256
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Intelligent informatics
Research Institution	Nagasaki University
Principal Investigator	MASADA Tomonari 長崎大学, 工学研究科, 准教授 (60413928)
Project Period (FY)	2014-04-01 – 2017-03-31
Keywords	トピックモデル / 機械学習 / ベイズ推定 / データマイニング / テキストマイニング
Outline of Final Research Achievements	The aim of our research is to make a efficient and effective summary of a large set of documents like news articles, academic papers, novels, etc. When the number of given documents is very large, we can only read a small portion of it. As a result, we may miss the documents containing our favorite topics. Therefore, our research aims to extract word lists from the give document set as a summary. For example, one among the extracted word lists was "game, hit, pitcher, and trade," we can know that there are documents discussing baseball. In this manner, by looking at the extracted word lists, we can know what kind of topics are discussed in the given document set. Furthermore, our research also provides a clue to find which documents are closely related to which word lists. Therefore, we can also find the documents relevant to the word lists we choose. While an existing method called topic modeling is adopted in our research, we propose its new application and its new implementation.
Free Research Field	データマイニング