2016 Fiscal Year Final Research Report
Fault tolerant computing based on a multi-SPMD programming/execution environment
Project/Area Number |
26730064
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Multi-year Fund |
Research Field |
High performance computing
|
Research Institution | Institute of Physical and Chemical Research |
Principal Investigator |
Tsuji Miwako 国立研究開発法人理化学研究所, 計算科学研究機構, 研究員 (80466466)
|
Project Period (FY) |
2014-04-01 – 2017-03-31
|
Keywords | 耐故障性 / プログラミングモデル |
Outline of Final Research Achievements |
In this research, we have supported fault tolerance features in a multi-SPMD programming/execution environment, where tasks in a workflow are executed in distributed parallel. The programming environment adopts multi-programming methodologies across multi-architectural levels, such as Numa-core groups in a node, nodes in a cluster, a cluster of clusters, to realize scalability. To achieve a fault tolerance and resilience mechanism without any modification of the application’s source code, we have developed middleware to detect errors in remote programs and extended workflow scheduler to realize fault resilience.
|
Free Research Field |
高性能計算
|