複数歩行者が存在するシナリオにおける社会受容性を考慮したロボットナビゲーション

研究課題

研究課題/領域番号	23KJ0580
研究種目	特別研究員奨励費
配分区分	基金
応募区分	国内
審査区分	小区分61050:知能ロボティクス関連
研究機関	東京大学
研究代表者	呉家旭東京大学, 工学系研究科, 特別研究員(DC2)
研究期間 (年度)	2023-04-25 – 2025-03-31
研究課題ステータス	交付 (2023年度)
配分額 *注記	1,800千円 (直接経費: 1,800千円) 2024年度: 900千円 (直接経費: 900千円) 2023年度: 900千円 (直接経費: 900千円)
キーワード	robot learning / reinforcement learing / autonomous navigation / human-robot interaction
研究開始時の研究の概要	This research focus on realizing a socially-aware navigation for mobile robot. A novel reinforcement learning framework is proposed which can train the mobile robot to efficiently navigate through crowds, and encourage the robot to politely cooperate with other pedestrians to avoid collision.
研究実績の概要	This year, a reinforcement learning (RL) framework using Deep Neural Networks (DNN) for socially aware navigation in human populated environment was established with 3 keypoints. For state representation, 2 alternatives include “Attention networks” and “Graph convolutional networks" and have been compared. For reward design, an entropy-based reward was newly proposed which aims to maximize choice of routes of surrounding pedestrians. The results in both simulations and real-world experiments demonstrated that the proposed reward induced socially aware navigation policy outperformed previous works. For sensitivity of risk, this research proposed an algorithm call LDM-based risk detector to early detect the tendency of falling into such areas of the RL navigation policy
現在までの達成度 (区分)	現在までの達成度 (区分) 1: 当初の計画以上に進展している理由 Sim-to-real transfer is achieved in indoor navigation test. While the initial plan was to conduct real world experiments in 2024 , hardware design and system integration was completed early than the plan which make the test possible. The succeed and experience of indoor navigation test could be a strong foundation for larger scale outdoor experiences which is closer to the real world applications of this research. The progress of benchmark on pervious navigation methods is smoother than plan. This is important for evaluating the proposed method and discussion on various design alternatives. Though the initial plan was to train all the previous methods in 2024, over half of the benchmark target have been trained in 2023 thanks to autonomous train-test pipe line.
今後の研究の推進方策	To train a navigation policy that can adapt to given rules without tedious re-training, Successor Feature (SF) and Maximum Diffusion (MaxDiff) will be introduced to the current reinforcement learning framework. The MaxDiff will be used to explore different types of robot-crowd interaction and SF will be used to summarize those ways to a task conditioned policy. Furthermore, performance of the navigation policy trained by the novel approach will be tested in large scenarios with more complexity compared to last year. Navigation test inside campus with start-goal distance over 100 meters and crowd size over 30 people will be conducted.