2017 Fiscal Year Annual Research Report

H行列法ライブラリの機能拡張と次世代スパコン向け最適化

Research Project

Project/Area Number	17H01749
Research Institution	The University of Tokyo
Principal Investigator	伊田明弘東京大学, 情報基盤センター, 特任准教授 (80742121)
Co-Investigator(Kenkyū-buntansha)	横田理央東京工業大学, 学術国際情報センター, 准教授 (20760573) 岩下武史北海道大学, 情報基盤センター, 教授 (30324685) 大島聡史九州大学, 情報基盤研究開発センター, 助教 (40570081) 平石拓京都大学, 学術情報メディアセンター, 助教 (60528222)
Project Period (FY)	2017-04-01 – 2020-03-31
Keywords	近似計算 / 低ランク / H行列 / ライブラリ / アルゴリズム / 並列計算 / 高性能計算 / ハイパフォーマンスコンピューティング
Outline of Annual Research Achievements	H行列法ライブラリHACApKの高機能化を目的とし、次の５項目について研究を行った。 (1)分散メモリ並列計算環境におけるH行列計算の改善に取り組んだ。効率的な通信パターンを構築することが最大の課題であったが、BLR行列の持つ規則性をH行列に導入することにより解決した。この新手法（格子H行列法）は、H行列法の高圧縮性とBLR行列の利便性を併せ持つ画期的な手法と言える。(2) 計算核不依存型FMM(KIFMM)をHACApKで扱うカーネルに対応させた。KIFMMでは擬似チャージをFMMセルに外接する球殻上に分布させる必要がある。高精度を達成しにくい欠点を改良した。(3) SMPクラスタ向けに開発してきたHACApKコードを元に、GPU向けのコードの開発を行った。特に、H行列を用いたBiCGSTAB法の高速実装に取り組み、単一GPU向け・複数GPU向けそれぞれについて高速なH行列計算コードを開発した。またGPUを用いたH行列生成手法の開発についても一定の成果を得た。(4) H行列ベクトル積のスレッド並列処理について、その改善に取り組んだ。HACApKライブラリに含まれる従来の実装の他に、動的負荷分散利用を含む5つの方式を実装し、それぞれ性能評価を行った。その結果、H行列ベクトル積のスレッド並列化では、負荷の均衡化に加えて、キャッシュヒット率を考慮する必要性があることが明らかとなり、各部分行列のアクセスは行方向に連続性を持たせる方が有利であることが分かった。(5)H行列生成処理のうち，行列の区分けを表現する木の構築処理の並列化に取り組んだ。生成すべき木は不規則な構造を持つため、ループ並列等での単純な並列化では良好な性能は得られにくい。そこで、タスク並列言語CilkPlusによる動的負荷分散機能を用いた並列化を行い、予備評価の結果ある程度の並列性能を得られることを確認した。
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 下記の通り、5つの研究項目ごとに達成度はまちまちであるが、総じて概ね順調である。 (1)H行列法とBLR法を比較検討し、格子H行列法の提案へと至る一連の研究を行った。数十万元データを数百コアで計算した数値実験では、格子H行列法は従来H行列法（HACApKライブラリ使用）の1/10以下の計算時間を達成することができた。研究会2回および招待講演１回で研究成果の発表を行い、査読付き国際会議投稿した論文２本が採択された。(2) FMMのコードのFortranインターフェイスは完成しており、あとはHACApKから呼べるようにデータ構造を一致させるだけである。FMM技術でもう一つH行列に転用できるものとして、dual tree traversalを用いたadmissibilityのtask並列な探索が挙げられる。これは密行列をH行列に圧縮する際の並列化性能を向上させることができると考えられる。(3) GPUを用いた高速なH行列計算法コードを実装し性能を評価した結果を研究会やワークショップにて発表した。特にH行列を用いたBiCGSTAB法については2件の査読付き国際学会に採録された。(4) H行列ベクトル積のスレッド並列化手法において、考慮すべき要素が明らかとなり、従来法の改良アルゴリズムの方向性が明らかとなった。(5) 木の構築の動的負荷分散による並列化については、共有メモリ版の実装が完成したものの、性能面での問題が残っている。そのため、分散環境対応の実装までには至らなかった。
Strategy for Future Research Activity	5つの研究項目ごとの推進方策は次の通りである。 (1)これまでに、格子H行列法の従来H行列法に対する高速化効果を、低ランク近似行列生成および行列・ベクトル積計算において検証した。今後は、より複雑な演算（LU分解・QR分解・近似逆行列計算など）について、格子H行列の並列計算アルゴリズムを開発していく。(2) 高性能化の観点からはKIFMMの高性能実装であるPVFMMを参考に多重極展開・局所展開の変換行列を保存し、インタラクションの対称性を利用してGEMMに落とし込む手法を採用する予定である。これが完成すればH行列の圧縮が高速にできるだけでなく、現在冗長に保存されている行列の成分を一回だけ保存することができ、H行列の密行列ブロック以外のメモリ使用量をO(1)に軽減できる。(3)引き続き、GPUを用いた高速なH行列計算法の実現に取り組む。既に査読付き国際学会にて発表している内容についてはさらなる性能向上や大規模環境での実行など発展課題に取り組む。行列生成についても査読付き国際学会等への採録を目指す。さらに、GPUを用いた格子H行列法の実現に取り組む。(4) H29年度の成果から、H行列ベクトル積の並列化では、部分行列のアクセスが行方向になるべく連続であることが有利であると判明したため、本方式に基づきながら、かつ負荷の均衡化を静的に図るアルゴリズムを開発、実装、評価する。(5) 共有メモリ版実装の性能問題を早急に解決するとともに，分散環境対応の実装も並行して進める。当初30年度以降に予定していたFMMとACAの混合方式への動的負荷分散への適用についても検討・実装を始める。

Research Products
(23 results)

All 2018 2017

All Journal Article (8 results) (of which Int'l Joint Research: 7 results, Peer Reviewed: 8 results) Presentation (15 results) (of which Int'l Joint Research: 7 results, Invited: 1 results)

[Journal Article] Application of Hierarchical Matrices to Large-Scale Electromagnetic Field Analyses of Coils Wound With Coated Conductors2018
- Author(s)
  Tominaga Naoki、Mifune Takeshi、Ida Akihiro、Sogabe Yusuke、Iwashita Takeshi、Amemiya Naoyuki
- Journal Title
  
  IEEE Transactions on Applied Superconductivity
  
  Volume: 28 Pages: 1～5
- DOI
  10.1109/TASC.2017.2780821
- Peer Reviewed
[Journal Article] Lattice H-Matrices on Distributed-Memory Systems2018
- Author(s)
  Akihiro Ida
- Journal Title
  
  32nd IEEE International Parallel & Distributed Processing Symposium (IPDPS 2018)
  
  Volume: 印刷中 Pages: 印刷中
- Peer Reviewed / Int'l Joint Research
[Journal Article] Parallel Hierarchical Matrices with Block Low-rank Representation on Distributed Memory Computer Systems2018
- Author(s)
  Ida Akihiro、Nakashima Hiroshi、Kawai Masatoshi
- Journal Title
  
  International Conference on High Performance Computing in Asia-Pacific Redion
  
  Volume: なし Pages: 232-240
- DOI
  10.1145/3149457.3149477
- Peer Reviewed / Int'l Joint Research
[Journal Article] Design of Parallel BEM Analyses Framework for SIMD Processors2018
- Author(s)
  Tetsuya Hoshino, Akihiro Ida, Toshihiro Hanawa, Kengo Nakajima
- Journal Title
  
  The International Conference on Computational Science 2018 (ICCS 2018)
  
  Volume: 印刷中 Pages: 印刷中
- Peer Reviewed / Int'l Joint Research
[Journal Article] Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU clusters2018
- Author(s)
  Ichitaro Yamazaki, Ahmad Abdelfattah, Akihiro Ida, Satoshi Ohshima, Stanimire Tomov, Rio Yokota, Jack Dongarra
- Journal Title
  
  32nd IEEE International Parallel & Distributed Processing Symposium (IPDPS 2018)
  
  Volume: 印刷中 Pages: 印刷中
- Peer Reviewed / Int'l Joint Research
[Journal Article] Optimization of Hierarchical Matrix Computation on GPU2018
- Author(s)
  Ohshima Satoshi、Yamazaki Ichitaro、Ida Akihiro、Yokota Rio
- Journal Title
  
  Asian Conference on Supercomputing Frontiers
  
  Volume: なし Pages: 274～292
- DOI
  https://doi.org/10.1007/978-3-319-69953-0_16
- Peer Reviewed / Int'l Joint Research
[Journal Article] Application of Improved H-Matrices in Micromagnetic Simulations of Spin Torque Oscillator2018
- Author(s)
  Ida Akihiro、Ataka Tadashi、Takahashi Yasuhito、Mifune Takeshi、Iwashita Takeshi、Furuya Atsushi
- Journal Title
  
  IEEE Transactions on Magnetics
  
  Volume: 54 Pages: 1～4
- DOI
  10.1109/TMAG.2017.2763611
- Peer Reviewed / Int'l Joint Research
[Journal Article] Software Framework for Parallel BEM Analyses with H-matrices Using MPI and OpenMP2017
- Author(s)
  Iwashita Takeshi、Ida Akihiro、Mifune Takeshi、Takahashi Yasuhito
- Journal Title
  
  Procedia Computer Science
  
  Volume: 108 Pages: 2200～2209
- DOI
  https://doi.org/10.1016/j.procs.2017.05.263
- Peer Reviewed / Int'l Joint Research
[Presentation] Lattice H-matrices : A new efficient variant on distributed memory systems2018
- Author(s)
  Akihiro Ida
- Organizer
  ATAT in HPC 2018
- Int'l Joint Research / Invited
[Presentation] Efficient Low-rank Solver for Integral Equations on Distributed Memory Systems2018
- Author(s)
  Akihiro Ida
- Organizer
  SIAM Conference on Parallel Processing for Scientific Computing 2018(SIAM PP 18)
- Int'l Joint Research
[Presentation] Performance Evaluations and Optimizations of H-Matrices for Many-Core Processors2018
- Author(s)
  Tetsuya Hoshino, Akihiro Ida, Toshihiro Hanawa
- Organizer
  SIAM Conference on Parallel Processing for Scientific Computing 2018(SIAM PP 18)
- Int'l Joint Research
[Presentation] Accelerating Hierarchical-Matrix Based Linear Solver on a GPU Cluster2018
- Author(s)
  Ichitaro Yamazaki, Satoshi Ohshima, Akihiro Ida, Rio Yokota, Jack Dongarra
- Organizer
  SIAM Conference on Parallel Processing for Scientific Computing 2018(SIAM PP 18)
- Int'l Joint Research
[Presentation] OpenCLを用いたFPGAによる階層型行列計算2018
- Author(s)
  塙敏博，伊田明弘，星野哲也
- Organizer
  第163回 HPC研究会
[Presentation] 階層型行列の区分け決定処理のCilk Plusによる並列化2018
- Author(s)
  白正陽, 平石拓, 伊田明弘, 中島浩
- Organizer
  第20回プログラミングおよびプログラミング言語ワークショップ（PPL2018）
[Presentation] Dynamic Load Balancing for Construction and Arithmetic of Hierarchical Matrices2018
- Author(s)
  Tasuku Hiraishi
- Organizer
  SIAM Conference on Parallel Processing for Scientific Computing 2018(SIAM PP 18)
- Int'l Joint Research
[Presentation] Application of Improved H-matrices in Micromagnetic Simulations2017
- Author(s)
  Ida Akihiro、Ataka Tadashi、Takahashi Yasuhito、Mifune Takeshi、Iwashita Takeshi、Furuya Atsushi
- Organizer
  21st International Conference on the Computation of Electromagnetic Fields (Compumag 2017)
- Int'l Joint Research
[Presentation] 階層型行列における行列分割法2017
- Author(s)
  伊田明弘, 河合直聡
- Organizer
  2017年並列／分散／協調処理に関する『秋田』サマー・ワークショップ（SWoPP秋田2017）
[Presentation] 階層型行列計算のFPGAへの適用2017
- Author(s)
  塙敏博，伊田明弘，星野哲也
- Organizer
  第161回HPC研究発表会
[Presentation] 階層型行列計算のGPU向け最適化2017
- Author(s)
  大島聡史, 山崎市太郎, 伊田明弘, 横田理央
- Organizer
  日本応用数理学会 2017年度年会
[Presentation] 階層型行列法ライブラリHACApKを用いたアプリケーションのメニーコア向け最適化2017
- Author(s)
  星野哲也，伊田明弘，塙敏博，中島研吾
- Organizer
  SWoPP秋田2017
[Presentation] GPUクラスタ上における階層型行列計算の最適化2017
- Author(s)
  大島聡史, 山崎市太郎, 伊田明弘, 横田理央
- Organizer
  SWoPP秋田2017
[Presentation] Performance Evaluation of Hierarchical Matrix Computation on Various Modern Architectures2017
- Author(s)
  Satoshi Ohshima, Ichitaro Yamazaki, Akihiro Ida, Rio Yokota
- Organizer
  SIAM Conference on Parallel Processing for Scientific Computing 2018(SIAM PP 18)
- Int'l Joint Research
[Presentation] Ｈ行列ベクトル積のスレッド並列化手法に関する性能評価2017
- Author(s)
  川村卓人, 深谷猛, 岩下武史, 伊田明弘
- Organizer
  2017年ハイパフォーマンスコンピューティングと計算科学シンポジウム（HPCS2017）

2017 Fiscal Year Annual Research Report

H行列法ライブラリの機能拡張と次世代スパコン向け最適化

Principal Investigator

伊田 明弘 東京大学, 情報基盤センター, 特任准教授 (80742121)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Application of Hierarchical Matrices to Large-Scale Electromagnetic Field Analyses of Coils Wound With Coated Conductors2018

Author(s)

Journal Title

DOI

[Journal Article] Lattice H-Matrices on Distributed-Memory Systems2018

Author(s)

Journal Title

[Journal Article] Parallel Hierarchical Matrices with Block Low-rank Representation on Distributed Memory Computer Systems2018

Author(s)

Journal Title

DOI

[Journal Article] Design of Parallel BEM Analyses Framework for SIMD Processors2018

Author(s)

Journal Title

[Journal Article] Analyzing Performance of BiCGStab with Hierarchical Matrix on GPU clusters2018

Author(s)

Journal Title

[Journal Article] Optimization of Hierarchical Matrix Computation on GPU2018

Author(s)

Journal Title

DOI

[Journal Article] Application of Improved H-Matrices in Micromagnetic Simulations of Spin Torque Oscillator2018

Author(s)

Journal Title

DOI

[Journal Article] Software Framework for Parallel BEM Analyses with H-matrices Using MPI and OpenMP2017

Author(s)

Journal Title

DOI

[Presentation] Lattice H-matrices : A new efficient variant on distributed memory systems2018

Author(s)

Organizer

[Presentation] Efficient Low-rank Solver for Integral Equations on Distributed Memory Systems2018

Author(s)

Organizer

[Presentation] Performance Evaluations and Optimizations of H-Matrices for Many-Core Processors2018

Author(s)

Organizer

[Presentation] Accelerating Hierarchical-Matrix Based Linear Solver on a GPU Cluster2018

Author(s)

Organizer

[Presentation] OpenCLを用いたFPGAによる階層型行列計算2018

Author(s)

Organizer

[Presentation] 階層型行列の区分け決定処理のCilk Plusによる並列化2018

Author(s)

Organizer

[Presentation] Dynamic Load Balancing for Construction and Arithmetic of Hierarchical Matrices2018

Author(s)

Organizer

[Presentation] Application of Improved H-matrices in Micromagnetic Simulations2017

Author(s)

Organizer

[Presentation] 階層型行列における行列分割法2017

Author(s)

Organizer

[Presentation] 階層型行列計算のFPGAへの適用2017

Author(s)

Organizer

[Presentation] 階層型行列計算のGPU向け最適化2017

Author(s)

Organizer

[Presentation] 階層型行列法ライブラリHACApKを用いたアプリケーションのメニーコア向け最適化2017

Author(s)

Organizer

[Presentation] GPUクラスタ上における階層型行列計算の最適化2017

Author(s)

Organizer

[Presentation] Performance Evaluation of Hierarchical Matrix Computation on Various Modern Architectures2017

Author(s)

Organizer

[Presentation] Ｈ行列ベクトル積のスレッド並列化手法に関する性能評価2017

Author(s)

伊田明弘東京大学, 情報基盤センター, 特任准教授 (80742121)