2022 年度実績報告書

Automated, Scalable, and Machine Learning-Driven Approach for Generating and Optimizing Scientific Application Codes

研究課題

研究課題/領域番号	22H03600
配分区分	補助金
研究機関	国立研究開発法人理化学研究所
研究代表者	WAHIB MOHAMED 国立研究開発法人理化学研究所, 計算科学研究センター, チームリーダー (00650037)
研究分担者	ドローズドアレクサンドロ国立研究開発法人理化学研究所, 計算科学研究センター, 研究員 (90740126)
研究期間 (年度)	2022-04-01 – 2026-03-31
キーワード	Code Generation / GPUs / Numerical Methods
研究実績の概要	In this fiscal year we developed a method for auto-generating Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU implementations have a loop on the host side that invokes the GPU kernel as much as time/algorithm steps there are. The termination of each kernel implicitly acts the barrier required after advancing the solution every time step. We propose an execution model for running memory-bound iterative GPU kernels: PERsistent KernelS (PERKS). In this model, the time loop is moved inside persistent kernel, and device-wide barriers are used for synchronization. We then reduce the traffic to device memory by caching subset of the output in each time step in the unused registers and shared memory. PERKS can be generalized to any iterative solver: they largely independent of the solver’s implementation. We demonstrated the effectiveness of PERKS for a wide range of iterative 2D/3D stencil benchmarks (geomean speedup of 2.12x for 2D stencils and 1.24x for 3D stencils over state-of-art libraries), and a Krylov subspace conjugate gradient solver (geomean speedup of 4.86x in smaller SpMV datasets from SuiteSparse and 1.43x in larger SpMV datasets over a state-of-art library). All PERKS-based implementations available at: https://github.com/neozhang307/PERKS We believe auto-generated PERKS kernels would be widely used in programming GPUs in the future.
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 The project is progressing as expected. We were capable of publishing several high impact papers.
今後の研究の推進方策	Our plan for the next fiscal year is to incorporate our PERKS GPU kernel execution method in a polyhedral compiler toolchain. In particular, to utilize a polyhedral model for auto-generating code: first, we analyze the algorithm to be implemented and express it as affine loop nests. Next, apply polyhedral transformations to these loop nests to optimize for various factors like parallelism, locality, and vectorization. Then, use the transformed loop nests to generate code targeting the desired architecture, leveraging tools like the Polyhedral Compilation Infrastructure (PCI). Finally, validate the generated code through testing and performance profiling, iterating as necessary to refine both the model and the generated code for optimal efficiency and correctness.

研究成果
(6件)

すべて 2024 2023

すべて雑誌論文 (4件) (うち国際共著 4件、査読あり 4件、オープンアクセス 4件) 学会発表 (2件) (うち国際学会 2件、招待講演 2件)

[雑誌論文] PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications2023
- 著者名/発表者名
  Zhang Lingqi、Wahib Mohamed、Chen Peng、Meng Jintao、Wang Xiao、Endo Toshio、Matsuoka Satoshi
- 雑誌名
  
  ICS '23: Proceedings of the 37th International Conference on Supercomputing
  
  巻: 1 ページ: 167 to 179
- DOI
  10.1145/3577193.3593705
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Revisiting Temporal Blocking Stencil Optimizations2023
- 著者名/発表者名
  Zhang Lingqi、Wahib Mohamed、Chen Peng、Meng Jintao、Wang Xiao、Endo Toshio、Matsuoka Satoshi
- 雑誌名
  
  ICS '23: Proceedings of the 37th International Conference on Supercomputing
  
  巻: 1 ページ: 251 to 263
- DOI
  10.1145/3577193.3593716
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in Charge2023
- 著者名/発表者名
  Ismayilov Ismayil、Baydamirli Javid、Sagbili Dogan、Wahib Mohamed、Unat Didem
- 雑誌名
  
  ICS '23: Proceedings of the 37th International Conference on Supercomputing
  
  巻: 1 ページ: 192 to 202
- DOI
  10.1145/3577193.3593713
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads2023
- 著者名/発表者名
  Domke Jens、Vatai Emil、Gerofi Balazs、Kodama Yuetsu、Wahib Mohamed、Podobas Artur、Mittal Sparsh、Pericas Miquel、Zhang Lingqi、Chen Peng、Drozd Aleksandr、Matsuoka Satoshi
- 雑誌名
  
  ACM Transactions on Architecture and Code Optimization
  
  巻: 20 ページ: 1～26
- DOI
  10.1145/3629520
- 査読あり / オープンアクセス / 国際共著
[学会発表] Challenges of Scaling Deep Learning on HPC Systems2024
- 著者名/発表者名
  Mohamed Wahib
- 学会等名
  Challenges of Scaling Deep Learning on HPC Systems
- 国際学会 / 招待講演
[学会発表] High Performance Imaging Applications: At the Intersection of HPC and AI2024
- 著者名/発表者名
  Mohamed Wahib
- 学会等名
  Electronic Imaging’24
- 国際学会 / 招待講演

2022 年度 実績報告書

Automated, Scalable, and Machine Learning-Driven Approach for Generating and Optimizing Scientific Application Codes

研究代表者

WAHIB MOHAMED 国立研究開発法人理化学研究所, 計算科学研究センター, チームリーダー (00650037)

現在までの達成度 (区分)

理由

研究成果

[雑誌論文] PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications2023

著者名/発表者名

雑誌名

DOI

[雑誌論文] Revisiting Temporal Blocking Stencil Optimizations2023

著者名/発表者名

雑誌名

DOI

[雑誌論文] Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in Charge2023

著者名/発表者名

雑誌名

DOI

[雑誌論文] At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads2023

著者名/発表者名

雑誌名

DOI

[学会発表] Challenges of Scaling Deep Learning on HPC Systems2024

著者名/発表者名

学会等名

[学会発表] High Performance Imaging Applications: At the Intersection of HPC and AI2024

著者名/発表者名

学会等名

2022 年度実績報告書