Verification and improvement of Large Language Models utilizing structured medical data, and development of a method for explaining model outputs

Research Project

Project/Area Number	24K15166
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 62010:Life, health and medical informatics-related
Research Institution	Kumamoto University
Principal Investigator	野原康伸熊本大学, 大学院先端科学研究部(工), 准教授 (30624829)
Co-Investigator(Kenkyū-buntansha)	松本晃太郎九州大学, 医学研究院, 助教 (60932217)
Project Period (FY)	2024-04-01 – 2028-03-31
Project Status	Granted (Fiscal Year 2024)
Budget Amount *help	¥4,810,000 (Direct Cost: ¥3,700,000、Indirect Cost: ¥1,110,000) Fiscal Year 2027: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2026: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2025: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2024: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Keywords	大規模言語モデル / クリニカルパス / 診療テキスト / 機械学習 / 解釈手法
Outline of Research at the Start	大規模言語モデル(LLM)は、読解や文章生成などの自然言語処理で優れた能力を有しており、医療分野でも活躍が期待されるが、信頼性等の面で課題を抱えている。LLMの学習や検証には、大量の正解ラベルデータが必要であるが、医療データでは専門家の人手が必要であり、その収集には特段の労力を要する。本研究では、電子クリニカルパスという我々が保有する質の高い構造化医療データ基盤を活用することで、効率よく大量の正解ラベルを収集し、LLMの検証と改良を継続的に行うとともに、そのオープンデータ化を目指す。