Tiny data mining: reconstruction of large scale data with probability distributions as bases
Project/Area Number |
26330256
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Nagasaki University |
Principal Investigator |
|
Project Period (FY) |
2014-04-01 – 2017-03-31
|
Project Status |
Completed (Fiscal Year 2016)
|
Budget Amount *help |
¥4,810,000 (Direct Cost: ¥3,700,000、Indirect Cost: ¥1,110,000)
Fiscal Year 2016: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Fiscal Year 2015: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Fiscal Year 2014: ¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000)
|
Keywords | トピックモデル / 機械学習 / ベイズ推定 / データマイニング / テキストマイニング / 確率モデル / 知識発見 / 言語処理 |
Outline of Final Research Achievements |
The aim of our research is to make a efficient and effective summary of a large set of documents like news articles, academic papers, novels, etc. When the number of given documents is very large, we can only read a small portion of it. As a result, we may miss the documents containing our favorite topics. Therefore, our research aims to extract word lists from the give document set as a summary. For example, one among the extracted word lists was "game, hit, pitcher, and trade," we can know that there are documents discussing baseball. In this manner, by looking at the extracted word lists, we can know what kind of topics are discussed in the given document set. Furthermore, our research also provides a clue to find which documents are closely related to which word lists. Therefore, we can also find the documents relevant to the word lists we choose. While an existing method called topic modeling is adopted in our research, we propose its new application and its new implementation.
|
Report
(4 results)
Research Products
(11 results)