2005 Fiscal Year Final Research Report Summary

Model-based reinforcement learning : brain implementation and engineering applications

Research Project

Project/Area Number	15300102
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	一般
Research Field	Bioinformatics/Life informatics
Research Institution	Nara Institute of Science and Technology
Principal Investigator	ISHII Shin Nara Institute of Science and Technology, Graduate School of Information Science, Professor, 情報科学研究科, 教授 (90294280)
Co-Investigator(Kenkyū-buntansha)	SHIBATA Tomohiro Nara Institute of Science and Technology, Graduate School of Information Science, Associate Professor, 情報科学研究科, 助教授 (40359873) YOSHIDA Wako Nara Institute of Science and Technology, Graduate School of Information Science, Researcher, 情報科学研究科, 研究員 (30379599)
Project Period (FY)	2003 – 2005
Keywords	reinforcement learning / prefrontal cortex / computational neuroscience / robot control / Bayesian inference / non-invasive brain activity measurement / system identification
Research Abstract	[On-line Bayesian learning schemes] We devised an on-line Bayesian learning algorithm which can be applied to Gaussian stochastic processes and can estimate the system dimensionality and change occurrence in the target dynamics (Hirayama et al., 2004). We also devised a sequential Monte-Carlo-based method which can be applied to non-Gaussian stochastic processes and applied it to visual tracking problems (Bando, et al., in press). [Applications of model-based reinforcement learning and on-line learning] We succeeded in allowing a biped robot simulator to biped-walk autonomously, based on the combination of central pattern generator and reinforcement learning. We later extended this approach such to incorporate policy-gradient-based reinforcement learning. By further introducing an on-line model identification method, the autonomous learning by the biped simulator has been accelerated (Nakamura et al., 2005). Our reinforcement learning for a switching controller succeeded in swinging-up an … More d stabilizing an underactuated real robot, the acrobot. An autonomous training scheme based on the combination of the model-based reinforcement learning and the on-line model learning can construct a card-game playing agent for a multi-agent card game, which is as strong as a human expert player (Ishii, et al., 2005). [Reward-related prefrontal neural activities of primates] An electrophysiological study with a primates memory-based sensorimotor processing task revealed that the reward expectation significantly enhanced the selectivity of sensory working memory but not that of motor memory (Amemori, et al., 2005). [Neuropsychological study of humans prefrontal information processing] We developed an information processing model during a human performs a Markov decision process, and evaluated the model plausibility by means of neuropsychological studies with functional magnetic resonance imaging. We found the engagement of dorsolateral prefrontal cortex (Yoshida, et al., 2005). When the Markov decision environment involves uncertainty, its resolution could be performed in front-polar prefrontal cortex (Yoshida, et al., in press). Less

Research Products
(68 results)

All 2006 2005 2004 2003

All Journal Article (64 results) Book (1 results) Patent(Industrial Property Rights) (3 results)

[Journal Article] Anterior and superior lateral occipito-temporal cortex responsible for target motion prediction during overt and covert visual pursuit2006
- Author(s)
  Kawawaki, D.
- Journal Title
  
  Neuroscience Research 54・2
  
  Pages: 112-123
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Anterior and superior lateral occipito-temporal cortex responsible for target motion prediction during overt and covert visual pursuit2006
- Author(s)
  Kawawaki, D.
- Journal Title
  
  Neuroscience Research 54(2)
  
  Pages: 112-123
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Model-based reinforcement learning for large-scale multi-agent games with sampling-based state estimation2006
- Author(s)
  Fujita, H.
- Journal Title
  
  The eleventh international symposium on Artificial Life and Robotics GS3-1
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Natural policy gradient reinforcement learning method for a looper-like robot2006
- Author(s)
  Nakamura, Y.
- Journal Title
  
  The eleventh international symposium on Artificial Life and Robotics GS3-3
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] A Bayesian approach to blind source separation with variable number of sources2006
- Author(s)
  Hirayama, J.
- Journal Title
  
  The eleventh international symposium on Artificial Life and Robotics GS19-6
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Reinforcement learning of switching multiple controllers to control a real robot2006
- Author(s)
  Tokita, Y.
- Journal Title
  
  The eleventh international symposium on Artificial Life and Robotics GS22-3
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Prediction of the aperiodic time series of a visual target by humans2006
- Author(s)
  Shikauchi, M.
- Journal Title
  
  The eleventh international symposium on Artificial Life and Robotics GS1-4
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Nonlinear and noisy extension of independent component analysis : theory and its application to a pitch sensation model2005
- Author(s)
  Maeda, S.
- Journal Title
  
  Neural Computation 17・1
  
  Pages: 115-144
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Model-based reinforcement learning : A computational model and an fMRI study2005
- Author(s)
  Yoshida, W.
- Journal Title
  
  Neurocomputing 63C
  
  Pages: 253-269
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Temporal reasoning about two concurrent sequences of events2005
- Author(s)
  Ishihara, Y.
- Journal Title
  
  SIAM Journal on Computing 34・2
  
  Pages: 498-513
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] A model of smooth pursuit in primates based on learning the target dynamics2005
- Author(s)
  Shibata, T.
- Journal Title
  
  Neural Networks 18・3
  
  Pages: 213-224
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] A reinforcement learning scheme for a partially-observable multi-agent game2005
- Author(s)
  Ishii, S.
- Journal Title
  
  Machine Learning 59
  
  Pages: 31-54
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 複数制御器の切替え学習法による実アクロボットの制御2005
- Author(s)
  西村政哉
- Journal Title
  
  電子情報通信学会論文誌 J88-A・5
  
  Pages: 646-657
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 強化学習の基礎理論と応用2005
- Author(s)
  吉本潤一郎
- Journal Title
  
  計測と制御 44・5
  
  Pages: 313-318
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 強化学習 : 理論と応用2005
- Author(s)
  石井信
- Journal Title
  
  電子情報通信学会誌 88・1
  
  Pages: 804-810
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 方策こう配法に基づく強化学習法と二足歩行運動制御への応用2005
- Author(s)
  森健
- Journal Title
  
  電子情報通信学会論文誌 J88-D-II・6
  
  Pages: 1080-1089
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Aceobot control by learning the switching of multiple controllers2005
- Author(s)
  Yoshimoto, J.
- Journal Title
  
  Journal of Artifical Life and Robotics 9・2
  
  Pages: 67-71
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 部分観測カードゲームのためのモデル同定型強化学習2005
- Author(s)
  藤田肇
- Journal Title
  
  電子情報通信学会論文誌 J88-D-II・11
  
  Pages: 2277-2287
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Off-policy natural policy grandient method for a biped walking using a CPG controller2005
- Author(s)
  Nakamura, Y.
- Journal Title
  
  Journal of Robotics and Mechatronics 17・6
  
  Pages: 636-644
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Contrasting effects of reward expectation on sensory and motor memories in primate prefrontal neurons2005
- Author(s)
  Amemori, K.
- Journal Title
  
  Cerebral Cortex doi:10.1093/cercor/bhj042
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Nonlinear and noisy extension of independent component analysis : theory and its application to a pitch sensation model2005
- Author(s)
  Maeda, S.
- Journal Title
  
  Neural Computation 17(1)
  
  Pages: 115-144
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Temporal reasoning about two concurrent sequences of events2005
- Author(s)
  Ishihara, Y.
- Journal Title
  
  SIAM Journal on Computing 34(2)
  
  Pages: 498-513
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] A model of smooth pursuit in primates based on learning the target dynamics2005
- Author(s)
  Shibata, T.
- Journal Title
  
  Neural Networks 18(3)
  
  Pages: 213-224
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Acrobot control by learning the switching of multiple controllers2005
- Author(s)
  Yoshimoto, J.
- Journal Title
  
  Journal of Artificial Life and Robotics 9(2)
  
  Pages: 67-71
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Off-policy natural policy gradient method for a biped walking using a CPG controller2005
- Author(s)
  Nakamura, Y.
- Journal Title
  
  Journal of Robotics and Mechatronics 17(6)
  
  Pages: 636-644
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Contrasting effects of reward expectation on sensory and motor memories in primate prefrontal neurons2005
- Author(s)
  Amemori, K.
- Journal Title
  
  Cerebral Cortex
  
  Pages: doi:10.1093/cercor/bhj042
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Hard/soft switching particle filters for efficient real-time visual tracking2005
- Author(s)
  Bando, T.
- Journal Title
  
  Proceedings of the Tenth International Symposium on Artificial Life and Robotics GS15-5
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Gradual emergence of communication in a multi-agent environment2005
- Author(s)
  Tensho, S.
- Journal Title
  
  Proceedings of the Tenth International Symposium on Artificial Life and Robotics GS16-3
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Prediction-based optimal controls in artificial and real intelligences2005
- Author(s)
  Ishii, S.
- Journal Title
  
  Proceedings of International Symposium on The Art of Statistical Metaware
  
  Pages: 111-121
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Bayesian noisy ICA for source switching environments2005
- Author(s)
  Hirayama, J.
- Journal Title
  
  IEEE Workshop for Statistical Signal Processing 232
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Reinforcement learning of stable trajectory for quasi-passive-dynamic walking2005
- Author(s)
  Hitomi, K.
- Journal Title
  
  Modeling Natural Action Selection : Proceedings of an International Workshop
  
  Pages: 229-234
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] On-line learning of a feedback controller for quasi-passive-dynamic walking by a stochastic policy gradient method2005
- Author(s)
  Hitomi, K.
- Journal Title
  
  IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS) (IEEE)
  
  Pages: 1923-1928
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] An off-policy natural gradient method for a partial observable Markov decision process2005
- Author(s)
  Nakamura, Y.
- Journal Title
  
  Artificial Neural Networks : Formal Models and Their Applications - ICANN 2005, Lecture Notes in Computer Science 3697
  
  Pages: 431-436
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Model-based reinforcement learning for a multi-player card game with partial observability2005
- Author(s)
  Fujita, H.
- Journal Title
  
  The 2005 IEEE-WIC-ACM International Conference on Intelligent Agent Technology (IEEE)
  
  Pages: 467-470
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Localization of cyber rodent based on mixture Kalman filters2005
- Author(s)
  Magono, M.
- Journal Title
  
  Proceedings of 2005 International Symposium on Nonlinear Theory and its Applications
  
  Pages: 401-404
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Extended force/tactile senses of machines by measurement of user's biological signals2005
- Author(s)
  Nomura, T.
- Journal Title
  
  Proceedings 36th International Symposium on Robotics
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] 学習によるproduct codeの設計2004
- Author(s)
  前田新一
- Journal Title
  
  電子情報通信学会論文誌 J87-A・3
  
  Pages: 382-390
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 強化学習の脳内機構と情動による制御2004
- Author(s)
  吉田和子
- Journal Title
  
  心理学評論 47・1
  
  Pages: 150-164
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 神経振動子ネットワークを用いた強化学習法による歩行運動の獲得2004
- Author(s)
  中村泰
- Journal Title
  
  電子情報通信学会論文誌 J87-D-II・3
  
  Pages: 893-902
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Self-organization of delay lines by spike-time-dependent learning2004
- Author(s)
  Amemori, K.
- Journal Title
  
  Neurocomputing 61
  
  Pages: 291-316
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 部分観測環境での強化学習法とマルチエージェントゲームへの応用2004
- Author(s)
  石井信
- Journal Title
  
  システム/制御/情報 48・9
  
  Pages: 383-388
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Bayesian representation learning in cortex regulates by acetylcholine2004
- Author(s)
  Hirayama J.
- Journal Title
  
  Neural Networks 17・10
  
  Pages: 1391-1400
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Bayesian representation learning in cortex regulated by acetylcholine2004
- Author(s)
  Hirayama, J.
- Journal Title
  
  Neural Networks 17(10)
  
  Pages: 1391-1400
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] A probabilistic approach to identify the environmental models of mobile robots2004
- Author(s)
  Kanemoto, K.
- Journal Title
  
  Proceedings of the Ninth International Symposium on Artificial Life and Robotics 1
  
  Pages: 329-332
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Acrobot control by learning the switching of multiple controllers2004
- Author(s)
  Nishimura, M.
- Journal Title
  
  Proceedings of the Ninth International Symposium on Artificial Life and Robotics 2
  
  Pages: 633-636
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Optimization of product code2004
- Author(s)
  Maeda, S.
- Journal Title
  
  WSEAS Transactions on Systems 2(3)
  
  Pages: 473-476
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Application of multivariate autoregression modeling for analyzing the interaction between EEG and EMG in humans2004
- Author(s)
  Shibata, T.
- Journal Title
  
  International Congress Series 1270
  
  Pages: 249-253
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] A reinforcement learning scheme for a multi-agent card game with Monte Carlo state estimation2004
- Author(s)
  Fujita, H.
- Journal Title
  
  International Conference on Computational Intelligence for Modelling Control and Automation
  
  Pages: 799-806
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Reinforcement learning for CPG-driven biped robot2004
- Author(s)
  Mori, T.
- Journal Title
  
  The Nineteenth National Conference on Artificial Intelligence (AAAI)
  
  Pages: 623-630
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] A solving method for MDPs by minimizing variational free energy2004
- Author(s)
  Yoshimoto, J.
- Journal Title
  
  International Joint Conference on Neural Networks (IJCNN) (IEEE) 3
  
  Pages: 1817-1822
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Switching particle filters for efficient real-time visual tracking2004
- Author(s)
  Bando, T.
- Journal Title
  
  International Conference on Pattern Recognition (ICPR) 2
  
  Pages: 720-723
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Cortical representation learning regulated by acetylcholine2004
- Author(s)
  Hirayama, J.
- Journal Title
  
  Brain Inspired Cognitive Systems (Stirling, Sep., 2004), ICESS 3
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Natural policy gradient reinforcement learning for a CPG control of a biped robot2004
- Author(s)
  Nakamura, Y.
- Journal Title
  
  Parallel Problem Solving from Nature - PPSN VIII, Lecture Notes in Computer Science 3242
  
  Pages: 972-981
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] A noisy nonlinear independent component analysis2004
- Author(s)
  Maeda, S.
- Journal Title
  
  2004 IEEE International Workshop on Machine Learning for Signal Processing (IEEE)
  
  Pages: 173-182
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] An imaging study on human action selection using hierarchical rule2004
- Author(s)
  Funakoshi, H.
- Journal Title
  
  The Third International Conference on Development and Learning (ICDL)
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Reinforcement learning for a snake-like robot2004
- Author(s)
  Fukunaga, S.
- Journal Title
  
  Proceedings of 2004 International Symposium on Nonlinear Theory and its Applications
  
  Pages: 75-78
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Dopamine-induced depressing synapses sustain neural activities in prefrontal cortex : a simulation study2004
- Author(s)
  Igarashi, Y.
- Journal Title
  
  Proceedings of 2004 International Symposium on Nonlinear Theory and its Applications
  
  Pages: 509-512
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Reinforcement learning for a snake-like robot- controlled by a central pattern generator2004
- Author(s)
  Fukunaga, S.
- Journal Title
  
  Proceedings of the 2004 IEEE Conference on Robotics, Automation and Mechatronics (IEEE)
  
  Pages: 909-914
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] 連続力学システムの自動制御のためのオンラインEM強化学習法2003
- Author(s)
  吉本潤一郎
- Journal Title
  
  システム制御情報学会論文誌 16・5
  
  Pages: 209-217
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 変文法的ベイズ推定法に基づく正規法ガウス関数ネットワークと階層的モデル選択法2003
- Author(s)
  吉本潤一郎
- Journal Title
  
  計測自動制御学会論文誌 39・5
  
  Pages: 503-512
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] A model-based reinforcement learning : a computational model and an fMRI study2003
- Author(s)
  Yoshida, W.
- Journal Title
  
  11th European Symposium on Artificial Neural Networks, Belgium : d-side publications
  
  Pages: 313-318
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] System identification based on on-line variational Bayes method and its application to reinforcement learning2003
- Author(s)
  Yoshimoto, J.
- Journal Title
  
  Artificial Neural Networks and Neural Information Processing, Lecture Notes in Computer Science (Berlin : Springer-Verlag) 2714
  
  Pages: 123-131
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Prior hyperparameters in Bayesian PCA.2003
- Author(s)
  Oba, S.
- Journal Title
  
  Artificial Neural Networks and Neural Information Processing, Lecture Notes in Computer Science (Berlin : Springer-Verlag) 2714
  
  Pages: 271-279
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] A reinforcement learning scheme for a multi-agent card game2003
- Author(s)
  Fujita, H.
- Journal Title
  
  2003 IEEE International Conference on Systems, Man & Cybernetics
  
  Pages: 4071-4078
- Description
  「研究成果報告書概要(欧文)」より
[Book] 脳の計算機構「-ボトムアップ・トップダウンのダイナミクス-」,分担執筆(3章,pp.18-38)2005
- Author(s)
  佐藤雅昭, 石井信
- Total Pages
  21
- Publisher
  朝倉書店
- Description
  「研究成果報告書概要(和文)」より
[Patent(Industrial Property Rights)] 歪みあり符号方法及び装置、符号化方式プログラム及び記憶媒体2004
- Inventor(s)
  前田新一, 石井信
- Industrial Property Rights Holder
  独立行政法人科学技術振興機構
- Industrial Property Number
  特願 2004-004200
- Filing Date
  2004-01-09
- Description
  「研究成果報告書概要(和文)」より
[Patent(Industrial Property Rights)] 制御装置およびプログラム2004
- Inventor(s)
  石井信, 中村泰, 麻生和昭
- Industrial Property Rights Holder
  奈良先端科学技術大学院大学, トヨタ自動車株式会社
- Industrial Property Number
  特願 2004-267307
- Filing Date
  2004-09-14
- Description
  「研究成果報告書概要(和文)」より
[Patent(Industrial Property Rights)] 適応型制御器、適応型制御方法および適応型制御プログラム2003
- Inventor(s)
  吉本潤一郎, 石井信
- Industrial Property Rights Holder
  科学技術振興事業団, 奈良先端科学技術大学院大学
- Industrial Property Number
  特願 2003-314621, 特開 2005-84834
- Filing Date
  2003-09-05
- Description
  「研究成果報告書概要(和文)」より

2005 Fiscal Year Final Research Report Summary

Model-based reinforcement learning : brain implementation and engineering applications

Principal Investigator

ISHII Shin Nara Institute of Science and Technology, Graduate School of Information Science, Professor, 情報科学研究科, 教授 (90294280)

Research Products

[Journal Article] Anterior and superior lateral occipito-temporal cortex responsible for target motion prediction during overt and covert visual pursuit2006

Author(s)

Journal Title

Description

[Journal Article] Anterior and superior lateral occipito-temporal cortex responsible for target motion prediction during overt and covert visual pursuit2006

Author(s)

Journal Title

Description

[Journal Article] Model-based reinforcement learning for large-scale multi-agent games with sampling-based state estimation2006

Author(s)

Journal Title

Description

[Journal Article] Natural policy gradient reinforcement learning method for a looper-like robot2006

Author(s)

Journal Title

Description

[Journal Article] A Bayesian approach to blind source separation with variable number of sources2006

Author(s)

Journal Title

Description

[Journal Article] Reinforcement learning of switching multiple controllers to control a real robot2006

Author(s)

Journal Title

Description

[Journal Article] Prediction of the aperiodic time series of a visual target by humans2006

Author(s)

Journal Title

Description

[Journal Article] Nonlinear and noisy extension of independent component analysis : theory and its application to a pitch sensation model2005

Author(s)

Journal Title

Description

[Journal Article] Model-based reinforcement learning : A computational model and an fMRI study2005

Author(s)

Journal Title

Description

[Journal Article] Temporal reasoning about two concurrent sequences of events2005

Author(s)

Journal Title

Description

[Journal Article] A model of smooth pursuit in primates based on learning the target dynamics2005

Author(s)

Journal Title

Description

[Journal Article] A reinforcement learning scheme for a partially-observable multi-agent game2005

Author(s)

Journal Title

Description

[Journal Article] 複数制御器の切替え学習法による実アクロボットの制御2005

Author(s)

Journal Title

Description

[Journal Article] 強化学習の基礎理論と応用2005

Author(s)

Journal Title

Description

[Journal Article] 強化学習 : 理論と応用2005

Author(s)

Journal Title

Description

[Journal Article] 方策こう配法に基づく強化学習法と二足歩行運動制御への応用2005

Author(s)

Journal Title

Description

[Journal Article] Aceobot control by learning the switching of multiple controllers2005

Author(s)

Journal Title

Description

[Journal Article] 部分観測カードゲームのためのモデル同定型強化学習2005

Author(s)

Journal Title

Description

[Journal Article] Off-policy natural policy grandient method for a biped walking using a CPG controller2005

Author(s)

Journal Title