Baldwinian Evolution of Task-Specialised High-Efficiency Learning in Neural Networks

研究課題

研究課題/領域番号	23K11262
研究種目	基盤研究(C)
配分区分	基金
応募区分	一般
審査区分	小区分61040:ソフトコンピューティング関連
研究機関	信州大学
研究代表者	Arnold Solvi 信州大学, 工学部, 准教授(特定雇用) (80764935)
研究分担者	有田隆也名古屋大学, 情報学研究科, 教授 (40202759) 鈴木麗璽名古屋大学, 情報学研究科, 准教授 (20362296)
研究期間 (年度)	2023-04-01 – 2026-03-31
研究課題ステータス	交付 (2023年度)
配分額 *注記	4,160千円 (直接経費: 3,200千円、間接経費: 960千円) 2025年度: 650千円 (直接経費: 500千円、間接経費: 150千円) 2024年度: 2,990千円 (直接経費: 2,300千円、間接経費: 690千円) 2023年度: 520千円 (直接経費: 400千円、間接経費: 120千円)
キーワード	neural networks / artificial intelligence / learning algorithms / evolution of learning / meta-learning / Baldwin effect / artificial life / 人工知能 / ニューラルネットワーク / 人工生命 / ボールドウィン効果 / 機械学習
研究開始時の研究の概要	Learning is a key aspect of intelligence. In contrast to AI, humans can learn efficiently from limited experience. Human cognition is evolutionarily specialised to learn important tasks (e.g. learning to walk, acquiring language) rapidly. This is hypothesised to be a core factor in cognitive evolution. We computationally model the evolution of learning ability using neural networks, with a focus on such specialisation. Our goals are 1) to explore how we can make AI learning more human-like, and 2) to gain new insights in the evolution of intelligence in nature.
研究実績の概要	The main goal for this year was a proof-of-concept implementation of the hypothesised evolutionary scenario. We developed a model for evolution of neural networks with mechanisms for both reward-driven learning (an existing Reinforcement Learning algorithm) and direct synaptic weight modification via neuromodulation (a novel implementation of the neuromodulation concept that considers columnar neural structures). We designed 2D and 3D task domains consisting of navigation tasks that require individual learning to solve. We let neural network populations evolve on these domains, and analysed how learning abilities evolved. The resulting evolutionary dynamics are consistent with our theory: first reward-driven learning appears, then non-reward information is gradually integrated into the learning process via the neuromodulation mechanism, thereby improving the efficiency of the learning process. On the present task domains, evolution eventually eliminates the need for reward signals altogether, enabling reward-agnostic learning of the tasks. We performed a quantitative comparison with a representative non-evolutionary Reinforcement Learning algorithm, and found that the learning abilities evolved in our model learn over 300 times faster on the task domain. These results support our theory and indicate its potential for improving learning ability in neural networks. We prepared a conference paper discussing our theory and results.
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 The project is mostly proceeding as planned. We switched from a locomotion task to a navigation task for the initial proof of concept because the latter is computationally lightweight, allowing us to experiment more effectively with various neural network implementations during the early stages of the project. Results on the navigation task exceeded expectations. The originally planned locomotion task has also been developed, and we plan to run experiments on this task in FY2024.
今後の研究の推進方策	In FY2024 so far, we submitted a conference paper on the theory and our first results, and made this paper publicly available as a pre-print. The main research direction for FY2024 will be to diversify the tasks we apply the system to. From a theoretical point of view, this should help clarify the role of the hypothesised evolutionary dynamic in biological evolution. From a practical point of view, this will clarify what sort of tasks the system solves well and what sort of tasks will require further development. We also plan to release source code to allow others to experiment with the approach.