Computational Methodology for Knowledge Discovery

Research Project

Project/Area Number	10143101
Research Category	Grant-in-Aid for Scientific Research on Priority Areas (A)
Allocation Type	Single-year Grants
Research Institution	Tohoku University
Principal Investigator	MARUOKA Akira Tohoku Univ., Graduate School of Information Sciences, Professor, 大学院・情報科学研究科, 教授 (50005427)
Co-Investigator(Kenkyū-buntansha)	SHINOHARA Ayumi Kyushu Univ., Dept. of Informatics, Associate Professor, 大学院・システム情報科学研究科, 助教授 (00226151) IMAI Hiroshi Univ. of Tokyo, Dept. of Information Science, Associate Professor, 大学院・理学系研究科, 助教授 (80183010) ABE Naoki I. B. M. Thomas J. Watson Research Center, Researcher, トーマスワトソン研究センター, 研究員 WATANABE Osamu Tokyo Institute of Technology, Dept. of Math. and Comp. Science, Professor, 大学院・情報理工学研究科, 教授 (80158617) TAKASU Atsuhiro National Institute of Informatics, Data Engineering Research, Software Research Division, Associate Professor, ソフトウェア研究系・データ工学研究部門, 助教授 (90216648)
Project Period (FY)	1998 – 2000
Project Status	Completed (Fiscal Year 2001)
Budget Amount *help	¥79,700,000 (Direct Cost: ¥79,700,000) Fiscal Year 2000: ¥21,800,000 (Direct Cost: ¥21,800,000) Fiscal Year 1999: ¥21,600,000 (Direct Cost: ¥21,600,000) Fiscal Year 1998: ¥36,300,000 (Direct Cost: ¥36,300,000)
Keywords	learning / sampling / boosting / linear classifier / search for subsequence patterns / text categorization / MDL-based compression / semi-structured data / 特徴空間の幾何学構造 / 学習可能性 / エキスパートオンラインモデル / 決定リスト / 適応型サンプリング / 質問学習 / 能動学習 / クラスタリング / 枝刈り / 方向選択性 / 強化学習
Research Abstract	The amount of data collected from various fields is growing exponentially and the task of analyzing data to extract useful information behind it is becoming more and more difficult accordingly. To extract useful information from data, there must be certain appropriate interaction between the extraction process and data. Through the interaction various processes, such as memorizing certain information, Iearning, evolution, and possibly discovering knowledge will be performed. The major hurdles to automatically extracting knowledge from huge amount of data is the limitations on computational resources. Group A03 aims to propose and develop computational models and methodologies for knowledge discovery. To achieve the purpose we explore various topics including algorithms dealing with heterogeneous data which may be strongly structured or poorly structured. Among the results of this project, the ones concerning computational mechanisms to find efficiently effective rules from very large databases are as follows : Efficient mining from large databases by query learning ; A modification of AdaBoost for adaptive sampling methods ; Tree-based boosting using linear classifier ; The minimax strategy for Gaussian density estimation. Furthermore, algorithms to solve certain concrete problems are developed ; A practical algorithm to find the best subsequence patterns ; Biological sequence compression algorithms - Learning via compression schemes ; Effect of sample size in text categorization ; Knowledge discovery by using both experimental and theoretical methods ; Discovery of commonality among definition sentences by MDL-based compression.

Report

(4 results)

Research Products
(31 results)

All Other

All Publications (31 results)

[Publications] A.Maruoka: "Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme"Theoretical Computer Science. 261(1). 179-209 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] N.Abe: "Efficient mining from large databases by query learning"The 17^<th> International Conference on Machine Learning. 17. 575-582 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] H.Imai: "Variance-Based k-Clustering Algorithms by Voronoi Diagrams and Randomization"IEICE Trans.Information and Systems. E83-D. 1199-1206 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] A.Shinohara: "A Practical Algorithm to Find the Best Subsequence Patterns"Proc.3rd International Conference on Discovery Science(DS2000). LNAI 1967. 141-154 (2000)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] 高須淳宏: "学術文献画像の書誌情報の近似マッチング法"情報処理学会論文誌:データベース. 42,SIG-1. 148-158 (2001)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] O.Watanabe: "Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms"Data Mining Knowledge and Discovery. 6(2)(to appear). (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] A.Maruoka, E.Takimoto: "Encyclopedia of Computer Science and Technology Vol.45"Marcel Dekker,Inc.. 448 (2002)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] A. Maruoka: "Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme"Theoretical Computer Science. 261(1). 179-209 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] N. Abe: "Efficient mining from large databases by query learning"The 17th International Conference on Machine Learning. 17. 575-582 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] H. Imai: "Variance-Based k-Clustering Algorithm by Voronoi Diagrams and Randomization"IEICE Trans. Information and Systems. E83-D. 1199-1206 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] A. Shinohara: "A Practical Algorithm to Find the Best Subsequence Patterns"Proc. 3rd International Conference on Discovery Science (DS2000), LNAI 1967. 141-154 (2000)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] A. Takasu: "An Approximate Matching Method for Bibliographic Data in Academic Article Images"IPSJ Transactions on Databases. Vol.42, No.SIG01. 148-158 (2001)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] O. Watanabe: "Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms"Data Mining Knowledge and Discovery. (to appear), Vol.6, No.2. (2002)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  2001 Final Research Report Summary
[Publications] Maruoka Akira: "On-line Estimation of Hidden Markov Model Parameters"Lecture Notes in Artificial Intelligence. 1967. 155-169 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Abe Naoki: "Efficient mining from large databases by query learning"The 17th International Conference on Machine Learning. Vol.17. 575-582 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Imai Hiroshi: "Variance-Based k-Clustering Algorithms by Voronoi Diagrams and Randomization"IEICE Trans. Information and Systems. Vol.E83-D. 1199-1206 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Shinohara Ayumi: "A practical algorithm to find the best subsequence patterns"Proc. 3rd International Conference on Discovery Science. LNAI1967. 141-154 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Takasu Atsuhiro: "学術文献画像の書誌情報の近似マッチング法"情報処理学会論文誌:データベース. Vol.42. 148-158 (2001)
- Related Report
  2000 Annual Research Report
[Publications] Watanabe Osamu: "MadaBoost : A modification of Ada Boost"Proc. of the 13th Conference on Computational Learning Theory. Vol.13. 180-189 (2000)
- Related Report
  2000 Annual Research Report
[Publications] Maruoka Akira: "Proper Learning Algorithm for Functions of k Terms under Smooth Distributions"Information and Computation. 152. 188-204 (1999)
- Related Report
  1999 Annual Research Report
[Publications] Abe Naoki: "Associative Reinforcement Learning with Linear Probabilistic Concepts"Proceedings of the 16th International Conference on Machine Learning. 3-11 (1999)
- Related Report
  1999 Annual Research Report
[Publications] Imai Hiroshi: "Finding Meaningful Regions Containing Given Keywords from Large Text Collections"Lecture Notes in Artificial Intelligence. 1721. 353-354 (1999)
- Related Report
  1999 Annual Research Report
[Publications] Shinohara Ayumi: "Shift-And approach to pattern matching in LZW compressed text"Lecture Notes in Computer Scienc. 1645. 1-13 (1999)
- Related Report
  1999 Annual Research Report
[Publications] Takasu Atsuhiro: "Music Structure Analysis and Its Application to Theme Phrase Extraction"Proceedings on the Third European Conference on Research and Advanced Technology for Digital Libraries. 92-105 (1999)
- Related Report
  1999 Annual Research Report
[Publications] Watanabe Osamu: "From computational learning theory to discovery science"Lecture Notes in Computer Scienc. 1644. 134-148 (1999)
- Related Report
  1999 Annual Research Report
[Publications] Maruola Akira: "Structured Weight-Based Prediction Algorithms" Lecture Notes in Artificial Intelligence. 1501. 127-142 (1998)
- Related Report
  1998 Annual Research Report
[Publications] Abe Naoki: "Empirical Comparison of Competing Query Learning Strategies" Lecture Notes in Artificial Intelligence. 1532. 387-388 (1998)
- Related Report
  1998 Annual Research Report
[Publications] Imai Hiroshi: "Geometric Clustering Models in Feature Space" Lecture Notes in Artificial Intelligence. 1532. 421-422 (1998)
- Related Report
  1998 Annual Research Report
[Publications] Shinohara Ayumi: "Uniform Characterizations of Polynomial-query Learnabilities" Lecture Notes in Artificial Intelligence. 1532. 84-92 (1998)
- Related Report
  1998 Annual Research Report
[Publications] Takasu Atsuhiro: "On the number of clusters in cluster analysis" Lecture Notes in Artificial Intelligence. 1532. 419-420 (1998)
- Related Report
  1998 Annual Research Report
[Publications] Watanabe Osamu: "A Role of Constraint in Self-Organization" Proceedings of the 2nd International Workshop. 307-318 (1998)
- Related Report
  1998 Annual Research Report

Computational Methodology for Knowledge Discovery

Principal Investigator

MARUOKA Akira Tohoku Univ., Graduate School of Information Sciences, Professor, 大学院・情報科学研究科, 教授 (50005427)

¥79,700,000 (Direct Cost: ¥79,700,000)

Report

Research Products

[Publications] A.Maruoka: "Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme"Theoretical Computer Science. 261(1). 179-209 (2001)

Description

Related Report

[Publications] N.Abe: "Efficient mining from large databases by query learning"The 17^<th> International Conference on Machine Learning. 17. 575-582 (2000)

Description

Related Report

[Publications] H.Imai: "Variance-Based k-Clustering Algorithms by Voronoi Diagrams and Randomization"IEICE Trans.Information and Systems. E83-D. 1199-1206 (2000)

Description

Related Report

[Publications] A.Shinohara: "A Practical Algorithm to Find the Best Subsequence Patterns"Proc.3rd International Conference on Discovery Science(DS2000). LNAI 1967. 141-154 (2000)

Description

Related Report

[Publications] 高須 淳宏: "学術文献画像の書誌情報の近似マッチング法"情報処理学会論文誌:データベース. 42,SIG-1. 148-158 (2001)

Description

Related Report

[Publications] O.Watanabe: "Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms"Data Mining Knowledge and Discovery. 6(2)(to appear). (2002)

Description

Related Report

[Publications] A.Maruoka, E.Takimoto: "Encyclopedia of Computer Science and Technology Vol.45"Marcel Dekker,Inc.. 448 (2002)

Description

Related Report

[Publications] A. Maruoka: "Predicting nearly as well as the best pruning of a decision tree through dynamic programming scheme"Theoretical Computer Science. 261(1). 179-209 (2001)

Description

Related Report

[Publications] N. Abe: "Efficient mining from large databases by query learning"The 17th International Conference on Machine Learning. 17. 575-582 (2000)

Description

Related Report

[Publications] H. Imai: "Variance-Based k-Clustering Algorithm by Voronoi Diagrams and Randomization"IEICE Trans. Information and Systems. E83-D. 1199-1206 (2000)

Description

Related Report

[Publications] A. Shinohara: "A Practical Algorithm to Find the Best Subsequence Patterns"Proc. 3rd International Conference on Discovery Science (DS2000), LNAI 1967. 141-154 (2000)

Description

Related Report

[Publications] A. Takasu: "An Approximate Matching Method for Bibliographic Data in Academic Article Images"IPSJ Transactions on Databases. Vol.42, No.SIG01. 148-158 (2001)

Description

Related Report

[Publications] O. Watanabe: "Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms"Data Mining Knowledge and Discovery. (to appear), Vol.6, No.2. (2002)

Description

Related Report

[Publications] Maruoka Akira: "On-line Estimation of Hidden Markov Model Parameters"Lecture Notes in Artificial Intelligence. 1967. 155-169 (2000)

Related Report

[Publications] Abe Naoki: "Efficient mining from large databases by query learning"The 17th International Conference on Machine Learning. Vol.17. 575-582 (2000)

Related Report

[Publications] Imai Hiroshi: "Variance-Based k-Clustering Algorithms by Voronoi Diagrams and Randomization"IEICE Trans. Information and Systems. Vol.E83-D. 1199-1206 (2000)

Related Report

[Publications] Shinohara Ayumi: "A practical algorithm to find the best subsequence patterns"Proc. 3rd International Conference on Discovery Science. LNAI1967. 141-154 (2000)

Related Report

[Publications] Takasu Atsuhiro: "学術文献画像の書誌情報の近似マッチング法"情報処理学会論文誌:データベース. Vol.42. 148-158 (2001)

Related Report

[Publications] Watanabe Osamu: "MadaBoost : A modification of Ada Boost"Proc. of the 13th Conference on Computational Learning Theory. Vol.13. 180-189 (2000)

Related Report

[Publications] Maruoka Akira: "Proper Learning Algorithm for Functions of k Terms under Smooth Distributions"Information and Computation. 152. 188-204 (1999)

Related Report

[Publications] Abe Naoki: "Associative Reinforcement Learning with Linear Probabilistic Concepts"Proceedings of the 16th International Conference on Machine Learning. 3-11 (1999)

Related Report

[Publications] Imai Hiroshi: "Finding Meaningful Regions Containing Given Keywords from Large Text Collections"Lecture Notes in Artificial Intelligence. 1721. 353-354 (1999)

Related Report

[Publications] Shinohara Ayumi: "Shift-And approach to pattern matching in LZW compressed text"Lecture Notes in Computer Scienc. 1645. 1-13 (1999)

Related Report

[Publications] Takasu Atsuhiro: "Music Structure Analysis and Its Application to Theme Phrase Extraction"Proceedings on the Third European Conference on Research and Advanced Technology for Digital Libraries. 92-105 (1999)

Related Report

[Publications] Watanabe Osamu: "From computational learning theory to discovery science"Lecture Notes in Computer Scienc. 1644. 134-148 (1999)

Related Report

[Publications] Maruola Akira: "Structured Weight-Based Prediction Algorithms" Lecture Notes in Artificial Intelligence. 1501. 127-142 (1998)

Related Report

[Publications] Abe Naoki: "Empirical Comparison of Competing Query Learning Strategies" Lecture Notes in Artificial Intelligence. 1532. 387-388 (1998)

Related Report

[Publications] Imai Hiroshi: "Geometric Clustering Models in Feature Space" Lecture Notes in Artificial Intelligence. 1532. 421-422 (1998)

Related Report

[Publications] Shinohara Ayumi: "Uniform Characterizations of Polynomial-query Learnabilities" Lecture Notes in Artificial Intelligence. 1532. 84-92 (1998)

Related Report

[Publications] Takasu Atsuhiro: "On the number of clusters in cluster analysis" Lecture Notes in Artificial Intelligence. 1532. 419-420 (1998)

Related Report

[Publications] Watanabe Osamu: "A Role of Constraint in Self-Organization" Proceedings of the 2nd International Workshop. 307-318 (1998)

[Publications] 高須淳宏: "学術文献画像の書誌情報の近似マッチング法"情報処理学会論文誌:データベース. 42,SIG-1. 148-158 (2001)