2016 Fiscal Year Research-status Report
A machine learning based system for storing and processing big spatial-temporal data
Project/Area Number |
16K16038
|
Research Institution | The University of Aizu |
Principal Investigator |
李 鵬 (李鵬) 会津大学, コンピュータ理工学部, 准教授 (30735915)
|
Project Period (FY) |
2016-04-01 – 2018-03-31
|
Keywords | big data processing / cloud |
Outline of Annual Research Achievements |
In FY2016, we develop an intelligent software platform for storing and processing big spatial-temporal data. First, we construct a hierarchy indexing structure based on 3-dimensional R-tree and distribute the R-tree and its associated data to multiple nodes. Second, we propose a traffic-aware task placement to minimize job completion time of MapReduce jobs on Spark. We develop an optimization framework by jointly considering both data and task placement in the MapReduce model. Finally, we study the randomness in MapReduce job execution and propose a novel optimization framework to guarantee predictable job completion time.
|
Current Status of Research Progress |
Current Status of Research Progress
1: Research has progressed more than it was originally planned.
Reason
In FY2016, we achieve our research goals by addressing many challenges in system performance optimization. In the beginning, we build the system and it works well in a small-scale cluster (less than 10 machines). However, when we deploy the system into a larger cluster, its performance is unsatisfied. Therefore, we make many research efforts on performance optimization by proposing new algorithms for job scheduling, traffic and storage management.
|
Strategy for Future Research Activity |
In FY2017, we will continue to study the ML-engine as we proposed in the original research plan. First, we adopt the machine learning technology to identify and predict data skew. Specifically, we first classify the space into several clusters according to the known data skew in history. Second, we adopt deep learning technology to extract data access pattern based on activity traces collected from services/application layer. Specifically, we create a multi-layer artificial neural network model. Then, we use a bottom-to-up process to obtain activation probabilities for all hidden units in the network. After that, a top-to-bottom process obtains good initial weights with minimum error. Finally, we integrate all components by defining clean interfaces and optimize message flow among them.
|