Project/Area Number |
22K12144
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 61030:Intelligent informatics-related
|
Research Institution | University of Tsukuba |
Principal Investigator |
叶 秀彩 筑波大学, システム情報系, 准教授 (60814001)
|
Project Period (FY) |
2022-04-01 – 2026-03-31
|
Project Status |
Granted (Fiscal Year 2023)
|
Budget Amount *help |
¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2025: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
Fiscal Year 2024: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2023: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
Fiscal Year 2022: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
|
Keywords | feature learning / feature selection / active feature selection / 特徴選択 / 中間表現 / データコラボレーション |
Outline of Research at the Start |
近年のデータ取得の簡易化に伴い、データは大規模・分散化している。分散管理されているデータは情報秘匿などの観点から共有が困難であり、またデータ数の不足や偏りによるリスク因子などの重要な特徴量の学習は難しくなる。重要な特徴量を学習するために,本研究では分散データの直接的な共有を行うことなく、中間表現によるデータ統合を行うことで、分散協調特徴量選択アルゴリズムを開発する。具体的には、各機関が独自に元データの抽象化を行い、抽象化されたデータ(中間表現)を同一の潜在空間に射影し、データ統合を行うことで特徴量選択のモデルを構築する。実データによる実証実験を行い、開発する特徴量選択手法の有効性を示す。
|
Outline of Annual Research Achievements |
This year, we focus on the research on feature learning-based methods and their application in bioinformatics. To address the challenges of limited samples and data imbalance, we present a novel framework for feature learning to analyze complex data structures, which are applied to predict antibiotic activity and enhance the efficiency of antibiotic discovery. In order to effectively learn the complex data, we utilize contrastive learning to extract the important features from complex structures. By integrating data augmentation and a pre-trained RoBERTa model, our method is able to accurately predict the hEFG blocker.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
In addressing the challenges posed by limited samples and imbalances within them, we introduce innovative methods to enhance feature learning and analyze complex data structures. Our approaches employ contrastive learning to effectively capture critical features from complex datasets, and integrates data augmentation alongside a pre-trained RoBERTa model to predict hEFG blockers with high precision, thus advancing the field of antibiotic discovery.
|
Strategy for Future Research Activity |
In the next step, we will address with data that is distributed in different locations. We're going to propose methods that learn from the data together without compromising privacy. We'll use effective techniques, including federated learning, to learn important features from the data. We're also planning to apply these methods to data that comes from multiple perspectives to improve how we combine and use this kind of data.
|