2020 Fiscal Year Annual Research Report
ExaPath: Hierarchical Routing for Next-Gen Supercomputers and Beyond
Project/Area Number |
19H04119
|
Research Institution | Institute of Physical and Chemical Research |
Principal Investigator |
ドンケ イェンス 国立研究開発法人理化学研究所, 計算科学研究センター, 研究員 (70815480)
|
Co-Investigator(Kenkyū-buntansha) |
遠藤 敏夫 東京工業大学, 学術国際情報センター, 教授 (80396788)
|
Project Period (FY) |
2019-04-01 – 2024-03-31
|
Keywords | HPC interconnects |
Outline of Annual Research Achievements |
In FY2020, the second year of the ExaPath project, we conducted two distinct studies for routing in HPC interconnects. The first published paper of this FY is a survey of data center and supercomputer networks, which investigates various aspects related to how multi-pathing is implemented in those systems, what type of routing they deploy, and how effectively utilize them for extensive communication loads. The survey with the title "High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers" was published in the IEEE Transactions on Parallel and Distributed Systems journal. The second published work, a peer-reviewed poster, is based on a Bachelor's thesis of our intern from Tokyotech which was presented at the 3rd R-CCS International Symposium. This thesis and poster tackled the fault resiliency of lossless interconnects and how to perform rerouting of the network while preserving certain properties, such as deadlock-freedom. Furthermore, we collaborated with researchers of ETH Zurich to develop a real Slimfly testbed and deploy the routing we developed in the previous FY. Simultaneously, we supervised with a colleague from ETH a second Bachelor's thesis with the topic of routing low-diameter topologies. Lastly, we disseminated our research findings through invited talks at the ISC High Performance conference (ISC'20) in a focus session on 'Photonics & Interconnects' and discussed our work and related routing and network topics with colleagues from academia and industry at various meetings and conference.
|
Current Status of Research Progress |
Current Status of Research Progress
3: Progress in research has been slightly delayed.
Reason
The original plan is slightly delayed, because COVID caused major disturbances in the research community as well as conference schedules. Hence, opportunities to seek new collaborators and chances to discuss and disseminate our research findings were fewer than expected.
|
Strategy for Future Research Activity |
The future direction of the research will primarily match the initially outlined plan in the project proposal. We will try to establish more international and domestic collaborations to develop a suitable HPC routing library which hopefully can be interfaced with the OpenFabrics Management Framework (OFMF) and other interconnection management frameworks. And we plan to develop new, and assist in the development (through collaborations) of new, routing algorithms for current and future HPC installations.
|