2015 Fiscal Year Final Research Report
Fault Tolerant Infrastructure Toward Billion of Parallelization and Exa-scale Supercomputer
Project/Area Number |
23220003
|
Research Category |
Grant-in-Aid for Scientific Research (S)
|
Allocation Type | Single-year Grants |
Research Field |
Computer system/Network
|
Research Institution | Tokyo Institute of Technology |
Principal Investigator |
Matsuoka Satoshi 東京工業大学, 学術国際情報センター, 教授 (20221583)
|
Co-Investigator(Kenkyū-buntansha) |
Hideyuki Jitsumoto 東京工業大学, 学術国際情報センター, 助教 (00545311)
|
Co-Investigator(Renkei-kenkyūsha) |
Toshio Endo 東京工業大学, 学術国際情報センター, 准教授 (80396788)
Hitoshi Sato 東京工業大学, 学術国際情報センター, 特任助教 (00550633)
Naoya Maruyama 理化学研究所, 計算科学研究機構, チームリーダ (60532801)
Shinichiro Takizawa 理化学研究所, 計算科学研究機構, 研究員 (80550483)
Kento Sato Lawrence Livermore National Laboratory, Postdoctoral Research Staff (50739696)
|
Research Collaborator |
Leonardo Bautista Gomez Barcelona Supercomputing Center, Senior Researcher
Jens Domke Technische Universitat Dresden, ZIH, Research Associate
|
Project Period (FY) |
2011-04-01 – 2016-03-31
|
Keywords | ハイパフォーマンスコンピューティング / エクサスケールコンピューティング / 耐故障性技術 / データ圧縮 / チェックポイント・リスタート / バーストバッファ |
Outline of Final Research Achievements |
Fault tolerance has been recognized as an indispensable technique for exascale computing as supercomputers grow towards billion-way of parallelism. For future exascale supercomputers, we proposed advanced fault tolerant infrastructures. The advanced fault tolerant infrastructures include a scalable checkpoint/restart library, a fault tolerant messaging interface and a highly resilient burst buffer architecture. We validated the effectiveness based on mathematical statistics. We also released the software and made impact to the community.
|
Free Research Field |
ハイパフォーマンスコンピューティング
|