A Study of a Highly Available Distributed Scheduling Scheme for Post-petascale Computing Environments
Project/Area Number |
25871199
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Multi-year Fund |
Research Field |
Software
High performance computing
|
Research Institution | National Institute of Advanced Industrial Science and Technology |
Principal Investigator |
TAKEFUSA Atsuko 独立行政法人産業技術総合研究所, 情報技術研究部門, 主任研究員 (70345411)
|
Project Period (FY) |
2013-04-01 – 2015-03-31
|
Project Status |
Completed (Fiscal Year 2014)
|
Budget Amount *help |
¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000)
Fiscal Year 2014: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2013: ¥2,340,000 (Direct Cost: ¥1,800,000、Indirect Cost: ¥540,000)
|
Keywords | 並列分散処理 / 耐障害性 / ポストペタスケール計算 / スケジューラ / 資源管理 / ポストペタスケール |
Outline of Final Research Achievements |
Fault resiliency is an important issue for post-petascale computing environments. In order to achieve fault resiliency of application programs running on such computers, we propose, design and develop a prototype system of a highly available distributed scheduler and investigate its performance characteristics. We first designed a highly available distributed self-scheduler and developed a Java-based prototype system using Apache ZooKeeper, and showed its availability and scalability. Then, we implement the proposed scheduler into the C-based falanx middleware developed using User Level Fault Mitigation (ULFM) MPI. We showed the feasibility of the proposed scheduler from the experiments.
|
Report
(3 results)
Research Products
(8 results)