2009 Fiscal Year Final Research Report
MATHEMATICAL STATISTICS FOR DATA ANALYSIS IN HIGH DIMENSION, LOW SAMPLE SIZE CONTEXT AND ITS APPLICATIONS
Project/Area Number |
18300092
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Statistical science
|
Research Institution | University of Tsukuba |
Principal Investigator |
AOSHIMA Makoto University of Tsukuba, 大学院・数理物質科学研究科, 教授 (90246679)
|
Co-Investigator(Kenkyū-buntansha) |
AKAHIRA Masafumi 筑波大学, 副学長 (70017424)
KOIKE Ken-ichi 筑波大学, 大学院・数理物質科学研究科, 准教授 (90260471)
OHYAUCHI Nao 筑波大学, 大学院・数理物質科学研究科, 助教 (40375374)
TASAKI Hiroyuki 筑波大学, 大学院・数理物質科学研究科, 准教授 (30179684)
KAWAMURA Kazuhiro 筑波大学, 大学院・数理物質科学研究科, 准教授 (40204771)
TAKAHASHI Hideto 筑波大学, 大学院・人間総合科学研究科, 准教授 (80261808)
MINAMI Nariyuki 慶應義塾大学, 医学部, 教授 (10183964)
|
Project Period (FY) |
2006 – 2009
|
Keywords | 多変量解析 / 機械学習 / パターン認識 / モデル選択 / ノイズ / 生体生命情報学 / マイクロアレイ / 高次元データ |
Research Abstract |
We developed the high-dimension asymptotic theory for High Dimension, Low Sample Size (HDLSS) datasets under a general setup such as non-Gaussian distributions. We found several geometric structures of HDLSS datasets. We showed that the naive PCA is inconsistent in the HDLSS context. We proposed effective inference methods called (1) the noise-reduction methodology, and (2) the cross-data-matrix methodology. By using those methodologies, we gave consistent estimation for intrinsic dimensionality, eigenvalues, their limiting distributions, PC directions and PC scores in the HDLSS context. We applied those methodologies to the discriminant analysis and the cluster analysis in HDLSS data situations from a microarray study of prostate cancer.
|