2018 年度実績報告書

強化学習を用いたIoTシステムの適応的電力制御の研究

研究課題

研究課題/領域番号	18J20946
研究機関	東京大学
研究代表者	SHRESTHAMALI SHASWOT 東京大学, 情報理工学系研究科, 特別研究員(DC1)
研究期間 (年度)	2018-04-25 – 2021-03-31
キーワード	Reinforcement Learning / Internet of Things / Wireless Sensor Networks / Machine Learning / Edge Intelligence / Distributed Reinforcement Learning / Energy Harvesting Wireless Sensor Nodes / Deep Q- Learning
研究実績の概要	For the academic year from AY 2018 to 2019, with the help of my research grant, I had the opportunity to participate in four different international conferences: DAC (Design Automation Conference), ICML (International Conference on Machine Learning), MICRO (International Symposium on Microarchitecture) and SC (Supercomputing). In these conferences, I was able to get better insight into the present research challenges and exchange ideas with other researchers in the field. I found the tutorials in the conference especially helpful. In these tutorials, I was able to learn not only the fundamentals in the fields of machine learning (HPC), computer microarchitecture and machine learning (ML) but also get hands-on exposure to the latest research and technology trends. A major achievement of the past academic year was to build a codebase for my research and conduct preliminary experiments. I researched on different Deep RL methods and architectures. In doing so, I developed the basic modules I will be using for further experimentation and research. I was able to achieve this by enrolling on various courses (offered by the university and from online courses) related to HPC and ML. The books I bought with my research grant were very helpful for this. In my experiments, I was able to achieve speedups in Reinforcement Learning (RL) for Internet of Things (IoT) by an order of magnitude using distributed RL techniques. With my experimental outcomes, I submitted a technical paper manuscript to an international conference on embedded devices (EMSOFT) which is now under peer review.
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 I was able to implement an original distributed Reinforcement Learning (RL) architecture for Wireless Sensor Nodes (WSNs) and their networks. This architecture is able to learn sophisticated control policies without having to use powerful computing resources which makes them suitable for low-power, low-end WSNs. This was achieved through a novel integration of tabular RL and function approximation methods using Deep Q-Networks (DQN). I implemented original methods to minimize quantization errors and coordinate exploration. The use of these novel strategies resulted in rapid and robust learning with speedups up to an order of magnitude over conventional methods. My results are now under peer-review in ACM SIGBED International Conference on Embedded Software (EMSOFT). My research efforts in this year will be directed towards the implementation of such systems in various applications related to WSNs. I would like to apply my proposed architecture and learning method and evaluate its effectiveness. It is required that systems employing RL in WSNs are robust to dynamic variations in working environment while being scalable. To increase my knowledge and exposure in the ongoing research topics related to scaling RL for WSNs, I also plan to attend academic conferences and summer schools. I also hope to get published in multiple high ranking academic conferences and journals. I am now working on building a theoretical basis for my proposed system. I plan to come up with mathematical bases that describe the relations between approximated Q-functions and their discretized counterparts.
今後の研究の推進方策	At the moment, my efforts are directed in gaining practical and mathematical insights into the scaling of Distributed Reinforcement Learning (RL) to improve my present proposed system. Once that is sufficiently achieved, I plan to implement my system not only to Wireless Sensor Nodes (WSNs) but also in other areas. These areas will typically be large and/or complex optimization problems. These may include application areas such as optimization of HPC systems, chip design/automation and control engineering. I hope to show the superiority of using RL based methods of solving these problems. Furthermore, I intend to show that it is possible to solve these problems within limited time and computational resource by using my Distributed RL system. Of the different aforementioned application areas, I hope to demonstrate the efficacy of my proposed system with a strong practical working example in at least one of them. Ideally, I would like my proposed methods to accelerate RL for learning complex policies without compromising on their robustness and sophistication within a practical time and resource constraint. Hopefully by the first quarter of the next academic year my research will have progressed enough to start preparing for my doctoral thesis. I plan to spend a majority of my last year of PhD on completing my research projects and writing my doctoral thesis.