2022 Fiscal Year Annual Research Report

Automated, Scalable, and Machine Learning-Driven Approach for Generating and Optimizing Scientific Application Codes

Research Project

Project/Area Number	22H03600
Allocation Type	Single-year Grants
Research Institution	Institute of Physical and Chemical Research
Principal Investigator	WAHIB MOHAMED 国立研究開発法人理化学研究所, 計算科学研究センター, チームリーダー (00650037)
Co-Investigator(Kenkyū-buntansha)	ドローズドアレクサンドロ国立研究開発法人理化学研究所, 計算科学研究センター, 研究員 (90740126)
Project Period (FY)	2022-04-01 – 2026-03-31
Keywords	Code Generation / GPUs / Numerical Methods
Outline of Annual Research Achievements	In this fiscal year we developed a method for auto-generating Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU implementations have a loop on the host side that invokes the GPU kernel as much as time/algorithm steps there are. The termination of each kernel implicitly acts the barrier required after advancing the solution every time step. We propose an execution model for running memory-bound iterative GPU kernels: PERsistent KernelS (PERKS). In this model, the time loop is moved inside persistent kernel, and device-wide barriers are used for synchronization. We then reduce the traffic to device memory by caching subset of the output in each time step in the unused registers and shared memory. PERKS can be generalized to any iterative solver: they largely independent of the solver’s implementation. We demonstrated the effectiveness of PERKS for a wide range of iterative 2D/3D stencil benchmarks (geomean speedup of 2.12x for 2D stencils and 1.24x for 3D stencils over state-of-art libraries), and a Krylov subspace conjugate gradient solver (geomean speedup of 4.86x in smaller SpMV datasets from SuiteSparse and 1.43x in larger SpMV datasets over a state-of-art library). All PERKS-based implementations available at: https://github.com/neozhang307/PERKS We believe auto-generated PERKS kernels would be widely used in programming GPUs in the future.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason The project is progressing as expected. We were capable of publishing several high impact papers.
Strategy for Future Research Activity	Our plan for the next fiscal year is to incorporate our PERKS GPU kernel execution method in a polyhedral compiler toolchain. In particular, to utilize a polyhedral model for auto-generating code: first, we analyze the algorithm to be implemented and express it as affine loop nests. Next, apply polyhedral transformations to these loop nests to optimize for various factors like parallelism, locality, and vectorization. Then, use the transformed loop nests to generate code targeting the desired architecture, leveraging tools like the Polyhedral Compilation Infrastructure (PCI). Finally, validate the generated code through testing and performance profiling, iterating as necessary to refine both the model and the generated code for optimal efficiency and correctness.

Research Products
(6 results)

All 2024 2023

All Journal Article (4 results) (of which Int'l Joint Research: 4 results, Peer Reviewed: 4 results, Open Access: 4 results) Presentation (2 results) (of which Int'l Joint Research: 2 results, Invited: 2 results)

[Journal Article] PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications2023
- Author(s)
  Zhang Lingqi、Wahib Mohamed、Chen Peng、Meng Jintao、Wang Xiao、Endo Toshio、Matsuoka Satoshi
- Journal Title
  
  ICS '23: Proceedings of the 37th International Conference on Supercomputing
  
  Volume: 1 Pages: 167 to 179
- DOI
  10.1145/3577193.3593705
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Revisiting Temporal Blocking Stencil Optimizations2023
- Author(s)
  Zhang Lingqi、Wahib Mohamed、Chen Peng、Meng Jintao、Wang Xiao、Endo Toshio、Matsuoka Satoshi
- Journal Title
  
  ICS '23: Proceedings of the 37th International Conference on Supercomputing
  
  Volume: 1 Pages: 251 to 263
- DOI
  10.1145/3577193.3593716
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in Charge2023
- Author(s)
  Ismayilov Ismayil、Baydamirli Javid、Sagbili Dogan、Wahib Mohamed、Unat Didem
- Journal Title
  
  ICS '23: Proceedings of the 37th International Conference on Supercomputing
  
  Volume: 1 Pages: 192 to 202
- DOI
  10.1145/3577193.3593713
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads2023
- Author(s)
  Domke Jens、Vatai Emil、Gerofi Balazs、Kodama Yuetsu、Wahib Mohamed、Podobas Artur、Mittal Sparsh、Pericas Miquel、Zhang Lingqi、Chen Peng、Drozd Aleksandr、Matsuoka Satoshi
- Journal Title
  
  ACM Transactions on Architecture and Code Optimization
  
  Volume: 20 Pages: 1～26
- DOI
  10.1145/3629520
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Challenges of Scaling Deep Learning on HPC Systems2024
- Author(s)
  Mohamed Wahib
- Organizer
  Challenges of Scaling Deep Learning on HPC Systems
- Int'l Joint Research / Invited
[Presentation] High Performance Imaging Applications: At the Intersection of HPC and AI2024
- Author(s)
  Mohamed Wahib
- Organizer
  Electronic Imaging’24
- Int'l Joint Research / Invited

2022 Fiscal Year Annual Research Report

Automated, Scalable, and Machine Learning-Driven Approach for Generating and Optimizing Scientific Application Codes

Principal Investigator

WAHIB MOHAMED 国立研究開発法人理化学研究所, 計算科学研究センター, チームリーダー (00650037)

Current Status of Research Progress

Reason

Research Products

[Journal Article] PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications2023

Author(s)

Journal Title

DOI

[Journal Article] Revisiting Temporal Blocking Stencil Optimizations2023

Author(s)

Journal Title

DOI

[Journal Article] Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in Charge2023

Author(s)

Journal Title

DOI

[Journal Article] At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads2023

Author(s)

Journal Title

DOI

[Presentation] Challenges of Scaling Deep Learning on HPC Systems2024

Author(s)

Organizer

[Presentation] High Performance Imaging Applications: At the Intersection of HPC and AI2024

Author(s)

Organizer