研究実績の概要 |
In FY2022, the forth year of the ExaPath project, we worked predominantly on the enhancements of the MocCUDA approach to aid and speedup the large-scale execution of deep learning frameworks on Fugaku, which are bottlenecked by the network as well as shortcomings in the code portability from CUDA to A64FX. Thanks to our previous publications, we were able to establish new international collaborations with reseachers from MIT, Google, and Argonne national lab. The outcome of this productive collaboration was published in "High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs" in the proceedings of the 28th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '23, as well as disseminated in multiple peer-reviewed posters. Furthermore, the internship student, who assisted this research, was able to successfully defend his Master's thesis and move on to a PhD program. We were also able to establish a connection to the team of Rockport networks to be able to evaluate their novel interconnection network technology and these research outcomes will contribute towards our project goal. The third collaboration with ETH Zurich around routing for their Slimfly proof-of-concept is still ongoing and will likely yield a peer-reviewed publication in FY2023. Lastly, we disseminated our research findings via talks at the JLESC workshop and Benchmarking in the Data Center: Expanding to the Cloud workshop and discussed our work and related routing and network topics with colleagues at various online meetings and conference.
|
今後の研究の推進方策 |
In the fifth fiscal year, the PI will continue the research into a novel hierarchical, adaptive routing for near‐term, large‐scale interconnect deployments, which use emerging technologies, such as Rockport network, RoCE, CXL, BXI, or Slingshot. This research will be performed with the assistance of a co‐investigator and two internship students and a JRA. The PI, with assistance of the two internship students, will study the SparCML approach to accelerate Large Language Models (LLMs) on Supercomputer Fugaku. The interns will collaborate with RIKEN's AI teams to investigate the current LLM communication patters, and implement a lossy and/or lossless variant of MPIAllreduce into the MPI library under consideration of the topology placement and novel routing approaches. Network-related research results and development tasks will be summarized and the PI will disseminate these documents among the broader network research community.
|