2019 Fiscal Year Annual Research Report
Adaptive Power Management of IoT Systems by Reinforcement Learning
|Research Institution||The University of Tokyo |
SHRESTHAMALI SHASWOT 東京大学, 情報理工学系研究科, 特別研究員(DC1)
|Project Period (FY)
2018-04-25 – 2021-03-31
|Keywords||Reinforcement Learning / Internet of Things / Wireless Sensor Nodes / Distributed Learning / Deep Q Networks|
|Outline of Annual Research Achievements
My research grant for this fiscal year was spent in acquiring a new computer and related accessories and visiting an international machine learning conference, ICML 2019.
With a new computer, I was able to run a new set of experiments and publish my findings in an international conference, ICCD 2019, Abu Dhabi. In this conference, I presented a distributed implementation of deep reinforcement learning for power management algorithms of energy harvesting wireless sensor nodes. My method was able to accelerate the learning by almost 50 times.
I also participated in ICML 2019 (USA), a top-tier machine learning conference. In this conference, I met and discussed with top machine learning researchers and attend excellent tutorials and workshops. Following this conference, I have some new research ideas that I would like to explore in the coming year.
Apart from my core research, I have also been working on projects related to feature learning for end-to-end reinforcement learning (RL) for video games. I also researched into different experience replay methods employed in RL. Additionally, I am investigating into offline model-based RL. I am also working on solving resource scheduling problems using RL. I am looking at ways I can use RL to tackle resource scheduling especially in connection to Internet of Things (IoT) devices.
|Current Status of Research Progress
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
At present, I am working on five different tasks:i) solving resource scheduling problems using reinforcement learning (RL), ii)survey paper for experience replay in RL, iii) model-based offline RL, iv) a journal paper that summarizes my research on power management of IoT devices using RL and v) my PhD thesis.
Currently I am focusing on using feudal RL-like methods to solve multi-objective optimization problem. This is the first step to tackling resource scheduling using RL. My approach is to feasibly decompose a complex problem into simper problems so as to reduce the computational complexity. I am in the process of conducting experiments and writing a paper.
The survey paper has had a preliminary literature review but still requires a second pass in greater depth. There are some complex mathematical concepts I need to familiarize myself with before I can start writing a survey.
I have begun preliminary experiments looking into offline model-based RL. I am trying to replicate the results of some of the state-of-the-art papers in this area.
I am concurrently writing code and designing experiments so that I may be able to reuse them for my journal paper and PhD thesis. I want to improve upon my research that I published in ICCD with stronger mathematical support and publish in a journal. Meanwhile, I am also writing and running experiments for my PhD thesis. While the theme has been decided, the contents are still tentative.
|Strategy for Future Research Activity
My future research plan has two major tracks: i) Resource Scheduling in Internet of Things (IoT) and ii) Off-policy model-based reinforcement learning. I intend to look into different methods that use RL to solve non-trivial scheduling problems like the job-shop scheduling or knapsack problem. The main challenges in applying RL to these problems, as far as I have understood, are to do with a correct definition of the problem as a Markov Decision Process (MDP) and finding a near-optimal policy in a potentially large solution space. I would like to investigate if machines are capable of deducing meaningful search policies to arrive at near-optimal solutions.
The second theme works with offline model based RL. The inspiration behind this is to understand how deep learning and RL work together. This is best exemplified by the deadly triad that is responsible for the instability and intractability of RL in real-world applications. The deadly triad basically says that deep RL is unstable due to the combination of bootstrapping, function approximation and offpolicy learning. My research into model-based off policy RL, with a focus on feature extraction and experience replay is to understand and determine the theoretical and practical limits that are imposed as a result of the deadly triad.
More concretely, I intend to invest my research funds into acquiring the necessary computational/experimental equipment for my research in addition to acquiring the necessary literature. I expect to spend a bulk of the research grant in funding my research trips to international conferences.
Research Products (1 results)