Bigdata scalable feature analysis based on consistency measures
Project/Area Number |
16K12491
|
Research Category |
Grant-in-Aid for Challenging Exploratory Research
|
Allocation Type | Multi-year Fund |
Research Field |
Intelligent informatics
|
Research Institution | Gakushuin University (2019) University of Hyogo (2016-2018) |
Principal Investigator |
Shin Kilho 学習院大学, 付置研究所, 教授 (60523587)
|
Project Period (FY) |
2016-04-01 – 2020-03-31
|
Project Status |
Completed (Fiscal Year 2019)
|
Budget Amount *help |
¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)
Fiscal Year 2018: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2017: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2016: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
|
Keywords | 特徴選択 / 教師あり学習 / 教師なし学習 / アルゴリズム / 機械学習 / クラスタリング / 侵入検知 / ビッグデータ / 分類 |
Outline of Final Research Achievements |
This research project has developed two practical feature selection algorithms, BornFS and UFVS, with high time efficiency that can scale to bigdata. BornFS, a feature selection algorithm in the supervised learning context, evaluates relevance, feature count and noise, which is a new measure to evaluate performance of feature selection introduced in our research, and is capable to features with an optimal balance among values of these three measures. UFVS, a feature selection algorithm in the unsupervised learning context on the other hand, outperforms any known algorithms in the literature in time efficiency. In principle, feature selection under the unsupervised learning setting is known to be significantly difficult, and as a result, the known algorithms were very slow. In contrast, UFVS has time efficiency that can scale to bigdata. In the experiments, UFVS could select small numbers of effective features for datasets with class labels but without using the class labels.
|
Academic Significance and Societal Importance of the Research Achievements |
特徴選択は機械学習の中心問題の一つであり、実用的にも、重要な役割を果たす。例えば、DNA配列から特定の疾病の原因となる塩基を決定する問題は、バイオインフォマティクスの観点から見れば、特徴選択の適用に他ならない。他にも、ネットワークに侵入したパケットの検知において、パケットヘッダーのどのフィールド値が証拠になるかを決定することも、特徴選択の適用で可能となる。また、特徴選択を行った後で、機械学習を行うことで、正確性と速度性能が改善されることも広く知られている。現実の問題では、データにラベルを付与することが容易でないが、教師なし学習における実用的な特徴選択に先鞭をつけた意義も有する。
|
Report
(5 results)
Research Products
(6 results)