• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2022 Fiscal Year Annual Research Report

ExaPath: Hierarchical Routing for Next-Gen Supercomputers and Beyond

Research Project

Project/Area Number 19H04119
Research InstitutionInstitute of Physical and Chemical Research

Principal Investigator

ドンケ イェンス  国立研究開発法人理化学研究所, 計算科学研究センター, チームリーダー (70815480)

Co-Investigator(Kenkyū-buntansha) 遠藤 敏夫  東京工業大学, 学術国際情報センター, 教授 (80396788)
Project Period (FY) 2019-04-01 – 2024-03-31
KeywordsHPC interconnects
Outline of Annual Research Achievements

In FY2022, the forth year of the ExaPath project, we worked predominantly on the enhancements of the MocCUDA approach to aid and speedup the large-scale execution of deep learning frameworks on Fugaku, which are bottlenecked by the network as well as shortcomings in the code portability from CUDA to A64FX. Thanks to our previous publications, we were able to establish new international collaborations with reseachers from MIT, Google, and Argonne national lab. The outcome of this productive collaboration was published in "High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs" in the proceedings of the 28th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '23, as well as disseminated in multiple peer-reviewed posters. Furthermore, the internship student, who assisted this research, was able to successfully defend his Master's thesis and move on to a PhD program. We were also able to establish a connection to the team of Rockport networks to be able to evaluate their novel interconnection network technology and these research outcomes will contribute towards our project goal. The third collaboration with ETH Zurich around routing for their Slimfly proof-of-concept is still ongoing and will likely yield a peer-reviewed publication in FY2023. Lastly, we disseminated our research findings via talks at the JLESC workshop and Benchmarking in the Data Center: Expanding to the Cloud workshop and discussed our work and related routing and network topics with colleagues at various online meetings and conference.

Current Status of Research Progress
Current Status of Research Progress

2: Research has progressed on the whole more than it was originally planned.

Reason

Most of the COVID-related backlog and slowdowns of the R&D were resolved over time and things are getting back to "normal", and therefore the status can be considered as on-track.

Strategy for Future Research Activity

In the fifth fiscal year, the PI will continue the research into a novel hierarchical, adaptive routing for near‐term, large‐scale interconnect deployments, which use emerging technologies, such as Rockport network, RoCE, CXL, BXI, or Slingshot. This research will be performed with the assistance of a co‐investigator and two internship students and a JRA. The PI, with assistance of the two internship students, will study the SparCML approach to accelerate Large Language Models (LLMs) on Supercomputer Fugaku. The interns will collaborate with RIKEN's AI teams to investigate the current LLM communication patters, and implement a lossy and/or lossless variant of MPIAllreduce into the MPI library under consideration of the topology placement and novel routing approaches. Network-related research results and development tasks will be summarized and the PI will disseminate these documents among the broader network research community.

  • Research Products

    (6 results)

All 2023 2022 Other

All Journal Article (3 results) (of which Int'l Joint Research: 3 results,  Peer Reviewed: 3 results) Presentation (2 results) (of which Int'l Joint Research: 1 results) Remarks (1 results)

  • [Journal Article] High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs2023

    • Author(s)
      Moses William S.、Ivanov Ivan R.、Domke Jens、Endo Toshio、Doerfert Johannes、Zinenko Oleksandr
    • Journal Title

      28th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '23

      Volume: 0 Pages: 119-134

    • DOI

      10.1145/3572848.3577475

    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Parallel Optimizations and Transformations of GPU Kernels Using a High-Level representation in MLIR/Polygeist2023

    • Author(s)
      I.R. Ivanov, W.S. Moses, J. Domke, T. Endo
    • Journal Title

      IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2023

      Volume: 0 Pages: 1

    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs2022

    • Author(s)
      W.S. Moses, I.R. Ivanov, J. Domke, T. Endo, J. Doerfert, O. Zinenko
    • Journal Title

      2022 LLVM Developers' Meeting

      Volume: 0 Pages: 1

    • Peer Reviewed / Int'l Joint Research
  • [Presentation] Working with Proxy-Applications: Interesting Findings, Lessons Learned, and Future Directions2022

    • Author(s)
      J. Domke
    • Organizer
      Benchmarking in the Data Center: Expanding to the Cloud (workshop) held in conjunction with PPoPP 2022: Principles and Practice of Parallel Programming 2022
    • Int'l Joint Research
  • [Presentation] Octopodes A candidate to replace Mini Apps and Motifs?2022

    • Author(s)
      J. Domke
    • Organizer
      14th JLESC Workshop
  • [Remarks] MocCUDA

    • URL

      https://gitlab.com/domke/MocCUDA

URL: 

Published: 2023-12-25  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi