2001 Fiscal Year Final Research Report Summary
Computational Methodology for Knowledge Discovery
Project/Area Number |
10143101
|
Research Category |
Grant-in-Aid for Scientific Research on Priority Areas (A)
|
Allocation Type | Single-year Grants |
Research Institution | Tohoku University |
Principal Investigator |
MARUOKA Akira Tohoku Univ., Graduate School of Information Sciences, Professor, 大学院・情報科学研究科, 教授 (50005427)
|
Co-Investigator(Kenkyū-buntansha) |
SHINOHARA Ayumi Kyushu Univ., Dept. of Informatics, Associate Professor, 大学院・システム情報科学研究科, 助教授 (00226151)
IMAI Hiroshi Univ. of Tokyo, Dept. of Information Science, Associate Professor, 大学院・理学系研究科, 助教授 (80183010)
ABE Naoki I. B. M. Thomas J. Watson Research Center, Researcher, トーマスワトソン研究センター, 研究員
WATANABE Osamu Tokyo Institute of Technology, Dept. of Math. and Comp. Science, Professor, 大学院・情報理工学研究科, 教授 (80158617)
TAKASU Atsuhiro National Institute of Informatics, Data Engineering Research, Software Research Division, Associate Professor, ソフトウェア研究系・データ工学研究部門, 助教授 (90216648)
|
Project Period (FY) |
1998 – 2000
|
Keywords | learning / sampling / boosting / linear classifier / search for subsequence patterns / text categorization / MDL-based compression / semi-structured data |
Research Abstract |
The amount of data collected from various fields is growing exponentially and the task of analyzing data to extract useful information behind it is becoming more and more difficult accordingly. To extract useful information from data, there must be certain appropriate interaction between the extraction process and data. Through the interaction various processes, such as memorizing certain information, Iearning, evolution, and possibly discovering knowledge will be performed. The major hurdles to automatically extracting knowledge from huge amount of data is the limitations on computational resources. Group A03 aims to propose and develop computational models and methodologies for knowledge discovery. To achieve the purpose we explore various topics including algorithms dealing with heterogeneous data which may be strongly structured or poorly structured. Among the results of this project, the ones concerning computational mechanisms to find efficiently effective rules from very large databases are as follows : Efficient mining from large databases by query learning ; A modification of AdaBoost for adaptive sampling methods ; Tree-based boosting using linear classifier ; The minimax strategy for Gaussian density estimation. Furthermore, algorithms to solve certain concrete problems are developed ; A practical algorithm to find the best subsequence patterns ; Biological sequence compression algorithms - Learning via compression schemes ; Effect of sample size in text categorization ; Knowledge discovery by using both experimental and theoretical methods ; Discovery of commonality among definition sentences by MDL-based compression.
|
Research Products
(13 results)