• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2020 Fiscal Year Annual Research Report

ExaPath: Hierarchical Routing for Next-Gen Supercomputers and Beyond

Research Project

Project/Area Number 19H04119
Research InstitutionInstitute of Physical and Chemical Research

Principal Investigator

ドンケ イェンス  国立研究開発法人理化学研究所, 計算科学研究センター, 研究員 (70815480)

Co-Investigator(Kenkyū-buntansha) 遠藤 敏夫  東京工業大学, 学術国際情報センター, 教授 (80396788)
Project Period (FY) 2019-04-01 – 2024-03-31
KeywordsHPC interconnects
Outline of Annual Research Achievements

In FY2020, the second year of the ExaPath project, we conducted two distinct studies for routing in HPC interconnects.
The first published paper of this FY is a survey of data center and supercomputer networks, which investigates various aspects related to how multi-pathing is implemented in those systems, what type of routing they deploy, and how effectively utilize them for extensive communication loads. The survey with the title "High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers" was published in the IEEE Transactions on Parallel and Distributed Systems journal.
The second published work, a peer-reviewed poster, is based on a Bachelor's thesis of our intern from Tokyotech which was presented at the 3rd R-CCS International Symposium. This thesis and poster tackled the fault resiliency of lossless interconnects and how to perform rerouting of the network while preserving certain properties, such as deadlock-freedom.
Furthermore, we collaborated with researchers of ETH Zurich to develop a real Slimfly testbed and deploy the routing we developed in the previous FY. Simultaneously, we supervised with a colleague from ETH a second Bachelor's thesis with the topic of routing low-diameter topologies.
Lastly, we disseminated our research findings through invited talks at the ISC High Performance conference (ISC'20) in a focus session on 'Photonics & Interconnects' and discussed our work and related routing and network topics with colleagues from academia and industry at various meetings and conference.

Current Status of Research Progress
Current Status of Research Progress

3: Progress in research has been slightly delayed.

Reason

The original plan is slightly delayed, because COVID caused major disturbances in the research community as well as conference schedules. Hence, opportunities to seek new collaborators and chances to discuss and disseminate our research findings were fewer than expected.

Strategy for Future Research Activity

The future direction of the research will primarily match the initially outlined plan in the project proposal. We will try to establish more
international and domestic collaborations to develop a suitable HPC routing library which hopefully can be interfaced with the OpenFabrics
Management Framework (OFMF) and other interconnection management frameworks. And we plan to develop new, and assist in the development (through
collaborations) of new, routing algorithms for current and future HPC installations.

  • Research Products

    (6 results)

All 2021 2020 Other

All Journal Article (2 results) (of which Int'l Joint Research: 2 results,  Peer Reviewed: 2 results) Presentation (3 results) (of which Int'l Joint Research: 3 results) Remarks (1 results)

  • [Journal Article] High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC Networks2021

    • Author(s)
      Maciej Besta, Jens Domke, Marcel Schneider, Marek Konieczny, Salvatore Di Girolamo, Timo Schneider, Ankit Singla, Torsten Hoefler
    • Journal Title

      IEEE Transactions on Parallel and Distributed Systems

      Volume: 32 Pages: 1-14

    • DOI

      10.1109/TPDS.2020.3035761

    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Improved failover for HPC interconnects through localised routing restoration2021

    • Author(s)
      Ivan R. Ivanov, Jens Domke, Akihiro Nomura, Toshio Endo
    • Journal Title

      The 3rd R-CCS International Symposium (RCCS-IS3)

      Volume: 0 Pages: -

    • Peer Reviewed / Int'l Joint Research
  • [Presentation] MocCUDA: Running CUDA codes on Fugaku2021

    • Author(s)
      Jens Domke
    • Organizer
      12th JLESC Workshop
    • Int'l Joint Research
  • [Presentation] Improved failover for HPC interconnects through localised routing restoration2021

    • Author(s)
      Ivan R. Ivanov
    • Organizer
      The 3rd R-CCS International Symposium (RCCS-IS3)
    • Int'l Joint Research
  • [Presentation] The Bright Future for HPC Interconnects -- Opportunities, Challenges, and Misconceptions in Deployment and Management of Large-Scale Networks2020

    • Author(s)
      Jens Domke
    • Organizer
      Focus Session: Leveraging Silicon Photonics in HPC to Meet Future Exascale Needs in 36th ISC High Performance (ISC ’21)
    • Int'l Joint Research
  • [Remarks] Repo for thesis of localised routing restoration:

    • URL

      https://gitlab.com/ivanradanov/localisedrerouting

URL: 

Published: 2022-12-28  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi