2014 Fiscal Year Final Research Report
A Study of a Highly Available Distributed Scheduling Scheme for Post-petascale Computing Environments
Project/Area Number |
25871199
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Multi-year Fund |
Research Field |
Software
High performance computing
|
Research Institution | National Institute of Advanced Industrial Science and Technology |
Principal Investigator |
TAKEFUSA Atsuko 独立行政法人産業技術総合研究所, 情報技術研究部門, 主任研究員 (70345411)
|
Project Period (FY) |
2013-04-01 – 2015-03-31
|
Keywords | 並列分散処理 / 耐障害性 / ポストペタスケール計算 / スケジューラ / 資源管理 |
Outline of Final Research Achievements |
Fault resiliency is an important issue for post-petascale computing environments. In order to achieve fault resiliency of application programs running on such computers, we propose, design and develop a prototype system of a highly available distributed scheduler and investigate its performance characteristics. We first designed a highly available distributed self-scheduler and developed a Java-based prototype system using Apache ZooKeeper, and showed its availability and scalability. Then, we implement the proposed scheduler into the C-based falanx middleware developed using User Level Fault Mitigation (ULFM) MPI. We showed the feasibility of the proposed scheduler from the experiments.
|
Free Research Field |
並列・分散処理
|