A Study on Acceleration by Temporal Blocking for Real-world Applications

Research Project

Project/Area Number	22K17898
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 60090:High performance computing-related
Research Institution	Nagoya University
Principal Investigator	Hoshino Tetsuya 名古屋大学, 情報基盤センター, 准教授 (40775946)
Project Period (FY)	2022-04-01 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2023: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000) Fiscal Year 2022: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Keywords	高性能計算 / ステンシル計算 / 時空間ブロッキング / 自動チューニング / テンポラルブロッキング / 性能モデル
Outline of Research at the Start	スーパーコンピュータに搭載される最新世代のCPUは大きな共有キャッシュを有し、これを効率的に利用する最適化手法として知られる時空間ブロッキングは、科学・工学分野のシミュレーションで頻出するステンシル計算を高速化するための手法である。しかし時空間ブロッキングは煩雑なプログラミングを要求するため、実アプリケーションへの応用は進んでいない。本研究課題では、比較的簡単なコード変換によって実現可能ながら、大容量の共有キャッシュメモリを用いることで高効率実行が可能なoverlapped方式の時空間ブロッキング手法に着目し、様々なCPUにおける性能モデリング及び実アプリケーションでの有効性の検証を行う。
Outline of Final Research Achievements	The specific calculation pattern for a discrete grid in time and space that arises when solving differential equations analytically is called a stencil calculation, and it is an important kernel that frequently appears in various fluid simulations. Acceleration of stencil calculations has been studied extensively, and the temporal blocking method is one such method, but has rarely been applied to real applications because it requires very complicated programming. Furthermore, since the performance of temporal blocking is highly dependent on the performance parameters of the processor executing the blocking, it is not realistic to optimize the blocking manually. Therefore, in this study, the performance modeling required for auto-tuning of temporal blocking was performed using state-of-the-art CPUs.
Academic Significance and Societal Importance of the Research Achievements	本研究では、主にHigh Bandwidth Memory（HBM）を搭載した最新のCPUである、富岳スパコンのA64FXや、Intel Xeon Sapphire Rapids世代のCPUを用いて、性能モデル化を進めた点に大きな価値がある。時空間ブロッキング手法はその性質上、特にメインメモリの性能とラストレベルキャッシュの性能比に性能が大きく依存する。この性能比はHBMの登場によって既存のCPUと大きく変化し、本研究では性能モデルによってその影響を明らかにしたことが、高性能計算分野において意義のある成果である。また当初想定していなかった、命令レイテンシの影響を明らかにした点も意義がある。

Report

(3 results)

2023 Annual Research Report Final Research Report ( PDF )
2022 Research-status Report

Research Products
(8 results)

All 2024 2023 2022

All Journal Article (3 results) (of which Peer Reviewed: 3 results) Presentation (5 results) (of which Invited: 1 results)

[Journal Article] Optimize Efficiency of Utilizing Systems by Dynamic Core Binding2024
- Author(s)
  Masatoshi Kawai, Akihiro Ida, Toshihiro Hanawa, Tetsuya Hoshino
- Journal Title
  
  HPCAsia '24 Workshops: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops
  
  Volume: none Pages: 77-82
- Related Report
  2023 Annual Research Report
- Peer Reviewed
[Journal Article] Auto-tuning Mixed-precision Computation by Specifying Multiple Regions2023
- Author(s)
  Ren Xuanzhengbo、Kawai Masatoshi、Hoshino Tetsuya、Katagiri Takahiro、Nagai Toru
- Journal Title
  
  2023 Eleventh International Symposium on Computing and Networking (CANDAR)
  
  Volume: none Pages: 175-181
- DOI
  10.1109/candar60563.2023.00031
- Related Report
  2023 Annual Research Report
- Peer Reviewed
[Journal Article] Optimizations of H-matrix-vector Multiplication for Modern Multi-core Processors2022
- Author(s)
  Hoshino Tetsuya、Ida Akihiro、Hanawa Toshihiro
- Journal Title
  
  2022 IEEE International Conference on Cluster Computing (CLUSTER)
  
  Volume: 2022 Pages: 462-472
- DOI
  10.1109/cluster51413.2022.00056
- Related Report
  2022 Research-status Report
- Peer Reviewed
[Presentation] OpenACCを用いた地震シミュレーションのGPU並列化2024
- Author(s)
  百武尚輝，星野哲也，小澤　創，伊田明弘，安藤亮輔，河合直聡，永井　亨，片桐孝洋
- Organizer
  情報処理学会全国大会
- Related Report
  2023 Annual Research Report
[Presentation] HPCカーネルベンチマークによるSapphire Rapids HBMの性能評価2024
- Author(s)
  星野哲也 , 河合直聡 , 伊田明弘 , 塙敏博 , 片桐孝洋
- Organizer
  研究報告ハイパフォーマンスコンピューティング（HPC）
- Related Report
  2023 Annual Research Report
[Presentation] Optimizations of H-matrix-vector Multiplication for Modern Multi-core Processors2023
- Author(s)
  Tetsuya Hoshino, Akihiro Ida, Toshihiro Hanawa
- Organizer
  Japan Geoscience Union Meeting 2023
- Related Report
  2023 Annual Research Report
- Invited
[Presentation] Optimizations of H-matrix-vector Multiplication for Modern Multi-core Processors2023
- Author(s)
  Tetsuya Hoshino, Akihiro Ida, Toshihiro Hanawa
- Organizer
  ICIAM
- Related Report
  2023 Annual Research Report
[Presentation] 分子軌道計算プログラムの性能評価と自動チューニング適用の検討2023
- Author(s)
  満田晴紀，星野哲也，望月祐志，坂倉耕太，片桐孝洋，大島聡史，永井亨，河合直聡
- Organizer
  研究報告ハイパフォーマンスコンピューティング（HPC）
- Related Report
  2023 Annual Research Report

A Study on Acceleration by Temporal Blocking for Real-world Applications

Principal Investigator

Hoshino Tetsuya 名古屋大学, 情報基盤センター, 准教授 (40775946)

¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)

Report

Research Products

[Journal Article] Optimize Efficiency of Utilizing Systems by Dynamic Core Binding2024

Author(s)

Journal Title

Related Report

[Journal Article] Auto-tuning Mixed-precision Computation by Specifying Multiple Regions2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Optimizations of H-matrix-vector Multiplication for Modern Multi-core Processors2022

Author(s)

Journal Title

DOI

Related Report

[Presentation] OpenACCを用いた地震シミュレーションのGPU並列化2024

Author(s)

Organizer

Related Report

[Presentation] HPCカーネルベンチマークによるSapphire Rapids HBMの性能評価2024

Author(s)

Organizer

Related Report

[Presentation] Optimizations of H-matrix-vector Multiplication for Modern Multi-core Processors2023

Author(s)

Organizer

Related Report

[Presentation] Optimizations of H-matrix-vector Multiplication for Modern Multi-core Processors2023

Author(s)

Organizer

Related Report

[Presentation] 分子軌道計算プログラムの性能評価と自動チューニング適用の検討2023

Author(s)

Organizer

Related Report