• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2015 Fiscal Year Final Research Report

Fault Tolerant Infrastructure Toward Billion of Parallelization and Exa-scale Supercomputer

Research Project

  • PDF
Project/Area Number 23220003
Research Category

Grant-in-Aid for Scientific Research (S)

Allocation TypeSingle-year Grants
Research Field Computer system/Network
Research InstitutionTokyo Institute of Technology

Principal Investigator

Matsuoka Satoshi  東京工業大学, 学術国際情報センター, 教授 (20221583)

Co-Investigator(Kenkyū-buntansha) Hideyuki Jitsumoto  東京工業大学, 学術国際情報センター, 助教 (00545311)
Co-Investigator(Renkei-kenkyūsha) Toshio Endo  東京工業大学, 学術国際情報センター, 准教授 (80396788)
Hitoshi Sato  東京工業大学, 学術国際情報センター, 特任助教 (00550633)
Naoya Maruyama  理化学研究所, 計算科学研究機構, チームリーダ (60532801)
Shinichiro Takizawa  理化学研究所, 計算科学研究機構, 研究員 (80550483)
Kento Sato  Lawrence Livermore National Laboratory, Postdoctoral Research Staff (50739696)
Research Collaborator Leonardo Bautista Gomez  Barcelona Supercomputing Center, Senior Researcher
Jens Domke  Technische Universitat Dresden, ZIH, Research Associate
Project Period (FY) 2011-04-01 – 2016-03-31
Keywordsハイパフォーマンスコンピューティング / エクサスケールコンピューティング / 耐故障性技術 / データ圧縮 / チェックポイント・リスタート / バーストバッファ
Outline of Final Research Achievements

Fault tolerance has been recognized as an indispensable technique for exascale computing as supercomputers grow towards billion-way of parallelism. For future exascale supercomputers, we proposed advanced fault tolerant infrastructures. The advanced fault tolerant infrastructures include a scalable checkpoint/restart library, a fault tolerant messaging interface and a highly resilient burst buffer architecture. We validated the effectiveness based on mathematical statistics. We also released the software and made impact to the community.

Free Research Field

ハイパフォーマンスコンピューティング

URL: 

Published: 2017-05-10  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi