• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

ExaPath: Hierarchical Routing for Next-Gen Supercomputers and Beyond

Research Project

Project/Area Number 19H04119
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Review Section Basic Section 60090:High performance computing-related
Research InstitutionInstitute of Physical and Chemical Research

Principal Investigator

Domke Jens  国立研究開発法人理化学研究所, 計算科学研究センター, チームリーダー (70815480)

Co-Investigator(Kenkyū-buntansha) 遠藤 敏夫  東京工業大学, 学術国際情報センター, 教授 (80396788)
Project Period (FY) 2019-04-01 – 2024-03-31
Project Status Completed (Fiscal Year 2023)
Budget Amount *help
¥17,160,000 (Direct Cost: ¥13,200,000、Indirect Cost: ¥3,960,000)
Fiscal Year 2023: ¥3,250,000 (Direct Cost: ¥2,500,000、Indirect Cost: ¥750,000)
Fiscal Year 2022: ¥3,250,000 (Direct Cost: ¥2,500,000、Indirect Cost: ¥750,000)
Fiscal Year 2021: ¥3,510,000 (Direct Cost: ¥2,700,000、Indirect Cost: ¥810,000)
Fiscal Year 2020: ¥3,250,000 (Direct Cost: ¥2,500,000、Indirect Cost: ¥750,000)
Fiscal Year 2019: ¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)
KeywordsHPC interconnects / routing algorithms / network design / artificial intelligance / message passing / routing / hierarchical / supercomputing
Outline of Research at the Start

The research objective is the invention and development of a novel type of
algorithms, which calculate the communication paths within supercomputer
networks. These novel algorithms will be hierarchical to overcome scalability
challenges of existing algorithms, which are insufficient for future system.

Outline of Final Research Achievements

Modern society is seeking for ever-increasing compute power to serve science fields such as artificial intelligence (e.g. ChatGPT), weather forecast, airplane and supernova simulations, and medicine discovery. Certain physical limitations prevent us from making computer chips much faster, and hence supercomputers and other tightly coupled compute systems started to scale-out to larger and larger systems. The backbone of all these architectures is the interconnection network, which must be "routed" correctly to increase the effectiveness of the entire system. This process is similar to GoogleMaps telling us which roads to drive. In the supercomputers, the routing tells the messages which path to take. However, the state-of-the-art routing algorithms cannot keep pace with the future scale-out and hardware trends, and hence new algorithms have to be invented. Our project aims to design novel routing approaches for highly complex networks which connect thousands or millions of computers.

Academic Significance and Societal Importance of the Research Achievements

Our developed routing algorithms, and methods to make supercomputer interconnects faster, will help other scientists to accelerate their workflows. Meaning, with optimal routing, the supercomputers can finish more scientific simulations, and hence the scientists can get more results in shorter time.

Report

(6 results)
  • 2023 Annual Research Report   Final Research Report ( PDF )
  • 2022 Annual Research Report
  • 2021 Annual Research Report
  • 2020 Annual Research Report
  • 2019 Annual Research Report
  • Research Products

    (25 results)

All 2024 2023 2022 2021 2020 2019 Other

All Journal Article (11 results) (of which Int'l Joint Research: 10 results,  Peer Reviewed: 11 results,  Open Access: 1 results) Presentation (11 results) (of which Int'l Joint Research: 7 results,  Invited: 3 results) Remarks (3 results)

  • [Journal Article] A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network2024

    • Author(s)
      N. Blach, M. Besta, D.D. Sensi, J. Domke, H. Harake, S. Li, P. Iff, M. Konieczny, K. Lakhotia, A. Kubicek, M. Ferrari, F. Petrini, T. Hoefler
    • Journal Title

      21st USENIX Symposium on Networked Systems Design and Implementation (NSDI '24)

      Volume: 0 Pages: 1025-1044

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Myths and legends in high-performance computing2023

    • Author(s)
      Matsuoka Satoshi、Domke Jens、Wahib Mohamed、Drozd Aleksandr、Hoefler Torsten
    • Journal Title

      The International Journal of High Performance Computing Applications

      Volume: 37 Issue: 3-4 Pages: 245-259

    • DOI

      10.1177/10943420231166608

    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs2023

    • Author(s)
      Moses William S.、Ivanov Ivan R.、Domke Jens、Endo Toshio、Doerfert Johannes、Zinenko Oleksandr
    • Journal Title

      28th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '23

      Volume: 0 Pages: 119-134

    • DOI

      10.1145/3572848.3577475

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Parallel Optimizations and Transformations of GPU Kernels Using a High-Level representation in MLIR/Polygeist2023

    • Author(s)
      I.R. Ivanov, W.S. Moses, J. Domke, T. Endo
    • Journal Title

      IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2023

      Volume: 0 Pages: 1-1

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs2022

    • Author(s)
      W.S. Moses, I.R. Ivanov, J. Domke, T. Endo, J. Doerfert, O. Zinenko
    • Journal Title

      2022 LLVM Developers' Meeting

      Volume: 0 Pages: 1-1

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Automatic translation of CUDA code into high performance CPU code using LLVM IR transformations2022

    • Author(s)
      I.R. Ivanov, J. Domke, T. Endo
    • Journal Title

      The 4rd R-CCS International Symposium (RCCS-IS4)

      Volume: 0 Pages: 1-1

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed
  • [Journal Article] A64FX - Your Compiler You Must Decide!2021

    • Author(s)
      J. Domke
    • Journal Title

      2021 IEEE International Conference on Cluster Computing (CLUSTER), EAHPC Workshop

      Volume: 0 Pages: 1-5

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC Networks2021

    • Author(s)
      Maciej Besta, Jens Domke, Marcel Schneider, Marek Konieczny, Salvatore Di Girolamo, Timo Schneider, Ankit Singla, Torsten Hoefler
    • Journal Title

      IEEE Transactions on Parallel and Distributed Systems

      Volume: 32 Issue: 4 Pages: 1-14

    • DOI

      10.1109/tpds.2020.3035761

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Improved failover for HPC interconnects through localised routing restoration2021

    • Author(s)
      Ivan R. Ivanov, Jens Domke, Akihiro Nomura, Toshio Endo
    • Journal Title

      The 3rd R-CCS International Symposium (RCCS-IS3)

      Volume: 0

    • Related Report
      2020 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] HyperX Topology: First at-scale Implementation and Comparison to the Fat-Tree2019

    • Author(s)
      Domke Jens、Matsuoka Satoshi、Ivanov Ivan R.、Tsushima Yuki、Yuki Tomoya、Nomura Akihiro、Miura Shin'ichi、McDonald Nie、Floyd Dennis L.、Dube Nicolas
    • Journal Title

      Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

      Volume: SC'19 Pages: 1-23

    • DOI

      10.1145/3295500.3356140

    • Related Report
      2019 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] The First Supercomputer with HyperX Topology: A Viable Alternative to Fat-Trees?2019

    • Author(s)
      Domke Jens、Matsuoka Satoshi、Radanov Ivan、Tsushima Yuki、Yuki Tomoya、Nomura Akihiro、Miura Shin'ichi、McDonald Nic、Floyd Dennis Lee、Dube Nicolas
    • Journal Title

      2019 IEEE Symposium on High-Performance Interconnects (HOTI)

      Volume: HOTI'26 Pages: 4-4

    • DOI

      10.1109/hoti.2019.00013

    • Related Report
      2019 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Presentation] Advanced Architecture "Playgrounds" Past Lessons and Future Accesses of Testbeds ... an update by RIKEN R-CCS2023

    • Author(s)
      J. Domke
    • Organizer
      International Conference for High Performance Computing, Networking, Storage and Analysis (SC '23)
    • Related Report
      2023 Annual Research Report
    • Invited
  • [Presentation] Working with Proxy-Applications: Interesting Findings, Lessons Learned, and Future Directions2022

    • Author(s)
      J. Domke
    • Organizer
      Benchmarking in the Data Center: Expanding to the Cloud (workshop) held in conjunction with PPoPP 2022: Principles and Practice of Parallel Programming 2022
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Octopodes A candidate to replace Mini Apps and Motifs?2022

    • Author(s)
      J. Domke
    • Organizer
      14th JLESC Workshop
    • Related Report
      2022 Annual Research Report
  • [Presentation] MocCUDA: Running Cuda Codes on Fugaku2022

    • Author(s)
      J. Domke
    • Organizer
      SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP ’22)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] MocCUDA: Running CUDA codes on Fugaku2021

    • Author(s)
      Jens Domke
    • Organizer
      12th JLESC Workshop
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Improved failover for HPC interconnects through localised routing restoration2021

    • Author(s)
      Ivan R. Ivanov
    • Organizer
      The 3rd R-CCS International Symposium (RCCS-IS3)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] The Bright Future for HPC Interconnects -- Opportunities, Challenges, and Misconceptions in Deployment and Management of Large-Scale Networks2020

    • Author(s)
      Jens Domke
    • Organizer
      Focus Session: Leveraging Silicon Photonics in HPC to Meet Future Exascale Needs in 36th ISC High Performance (ISC ’21)
    • Related Report
      2020 Annual Research Report
    • Int'l Joint Research
  • [Presentation] HyperX Topology: First at-scale Implementation and Comparison to the Fat-Tree2019

    • Author(s)
      Domke Jens
    • Organizer
      International Conference for High Performance Computing, Networking, Storage and Analysis (SC'19)
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] The First Supercomputer with HyperX Topology: A Viable Alternative to Fat-Trees?2019

    • Author(s)
      Domke Jens
    • Organizer
      2019 IEEE Symposium on High-Performance Interconnects
    • Related Report
      2019 Annual Research Report
    • Int'l Joint Research
  • [Presentation] The First Supercomputer with HyperX Topology: A Viable Alternative to Fat-Trees?2019

    • Author(s)
      Domke Jens
    • Organizer
      The 179th R-CCS Cafe
    • Related Report
      2019 Annual Research Report
    • Invited
  • [Presentation] First At-Scale HyperX Implementation: A Compelling Alternative to Fat-Trees?2019

    • Author(s)
      Domke Jens
    • Organizer
      High Performance Consortium for Advanced Scientific and Technical Computing (HP-CAST 32)
    • Related Report
      2019 Annual Research Report
    • Invited
  • [Remarks] MocCUDA

    • URL

      https://gitlab.com/domke/MocCUDA

    • Related Report
      2022 Annual Research Report
  • [Remarks] Repo for thesis of localised routing restoration:

    • URL

      https://gitlab.com/ivanradanov/localisedrerouting

    • Related Report
      2020 Annual Research Report
  • [Remarks] TSUBAME2 HyperX experiment

    • URL

      https://gitlab.com/domke/t2hx

    • Related Report
      2019 Annual Research Report

URL: 

Published: 2019-04-18   Modified: 2025-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi