Fault tolerant computing based on a multi-SPMD programming/execution environment
Project/Area Number |
26730064
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Multi-year Fund |
Research Field |
High performance computing
|
Research Institution | Institute of Physical and Chemical Research |
Principal Investigator |
Tsuji Miwako 国立研究開発法人理化学研究所, 計算科学研究機構, 研究員 (80466466)
|
Project Period (FY) |
2014-04-01 – 2017-03-31
|
Project Status |
Completed (Fiscal Year 2016)
|
Budget Amount *help |
¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2015: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2014: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
|
Keywords | 耐故障性 / プログラミングモデル / ワークフロー / 国際情報交換(アメリカ) / 国際情報交換(フランス) / 国際情報交換(アメリカ) / 国際情報交換(フランス) |
Outline of Final Research Achievements |
In this research, we have supported fault tolerance features in a multi-SPMD programming/execution environment, where tasks in a workflow are executed in distributed parallel. The programming environment adopts multi-programming methodologies across multi-architectural levels, such as Numa-core groups in a node, nodes in a cluster, a cluster of clusters, to realize scalability. To achieve a fault tolerance and resilience mechanism without any modification of the application’s source code, we have developed middleware to detect errors in remote programs and extended workflow scheduler to realize fault resilience.
|
Report
(4 results)
Research Products
(7 results)