研究課題/領域番号 |
23KF0063
|
研究種目 |
特別研究員奨励費
|
配分区分 | 基金 |
応募区分 | 外国 |
審査区分 |
小区分62010:生命、健康および医療情報学関連
|
研究機関 | 名古屋大学 |
研究代表者 |
山西 芳裕 名古屋大学, 情報学研究科, 教授 (60437267)
|
研究分担者 |
LI CHEN 名古屋大学, 情報学研究科, 外国人特別研究員
|
研究期間 (年度) |
2023-04-25 – 2025-03-31
|
研究課題ステータス |
交付 (2023年度)
|
配分額 *注記 |
2,000千円 (直接経費: 2,000千円)
2024年度: 1,000千円 (直接経費: 1,000千円)
2023年度: 1,000千円 (直接経費: 1,000千円)
|
キーワード | Generative AI Model / Deep Learning / Drug Discovery / Molecular Generation / Property Optimization |
研究開始時の研究の概要 |
In this study, we propose a new method by combining a transformer and GAN to generate realistic molecules. I would like to propose a property-optimized GAN that contains only transformer encoders to generate molecules with the desired chemical properties.
|
研究実績の概要 |
In tackling challenges such as the complexity of generating molecular representations (SMILES) via GANs, along with the non-uniqueness of SMILES representation and the instability associated with GAN training, I proposed an innovative de novo molecular generative model. To enhance the ability to capture features within molecular SMILES representations, a transformer and its variants were utilized as the generator and discriminator of the GAN. Additionally, the concept of variant SMILES was leveraged, recognizing that a molecule can manifest multiple distinct SMILES representations, to comprehensively train the model. Furthermore, molecular chemical properties were determined as rewards within the reinforcement learning-based framework. Such rewards effectively guide the update of the generator. To address the challenge of preserving molecular scaffold integrity in de novo molecular generation, a functional group generative model was introduced. This model not only generates functional groups for a given molecular scaffold but also optimizes molecular properties simultaneously. Diverging from traditional transformer, this model utilizes a reverse transformer with a first-decoder-then-encoder architecture to achieve GAN functionality.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
1: 当初の計画以上に進展している
理由
The research project has made significant strides beyond my original plans.In the realm where AI intersects with bioinformatics, my efforts have been focused on mitigating the development phases and substantial trial-and-error expenses inherent in conventional drug discovery processes. The main contributions include the de novo molecular generative models [1], functional group generation based on molecular scaffolds [2], and drug candidate generation based on gene expression profiles [3]. [1] C. Li and Y. Yamanishi (2024), “TenGAN: Pure transformer encoders make an efficient discrete GAN for de novo molecular generation,” In the proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024), top AI conference. [2] C. Li and Y. Yamanishi (2023), “SpotGAN: A reverse-transformer GAN generates scaffold-constrained molecules with property optimization,” In the proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2023). [3] C. Li and Y. Yamanishi (2024), “GxVAEs: Two joint VAEs generate hit molecules from gene expression profiles,” one of the three outstanding papers among the 2342 accepted papers at the top AI conference of the 38th AAAI Conference on Artificial Intelligence (AAAI 2024).
|
今後の研究の推進方策 |
In future work, , I aim to use advanced deep learning techniques to generate novel molecules with desired chemical properties. Moreover, considering the rich biological information available in gene expression profiles, I aim to produce molecules in combination with gene expression profiles. Moment Soft-Actor-Critic Reinforcement Learning Driven GAN for De Novo Molecule Generation My previous work [1,2] generated molecules with desired chemical properties using Monte Carlo tree search (MCTS) reinforcement learning algorithm. While MCTS is a powerful tool for molecular generation, a potential drawback of MCTS is its computational complexity. Since molecular generation involves a complex search space, this can be particularly challenging when using MCTS for molecular generation. As a result, MCTS can require a large number of computational resources, making it both time-consuming and expensive to implement. Additionally, the discriminators of past studies need to evaluate the entire SMILES strings. However, molecules that are discriminated as false by the discriminator are often due to a certain number of unsuitable atoms. In future work, I aim to propose a GAN based on soft-actor-critical reinforcement learning with a discriminator that evaluates the generated SMILES strings in a stepwise manner.
|