2023 Fiscal Year Annual Research Report

Application of Unconventional Linear Algebra Techniques to Continuous Learning in Supergiant Neural Networks

Research Project

Project/Area Number	20K20624
Research Institution	Tokyo Institute of Technology
Principal Investigator	横田理央東京工業大学, 学術国際情報センター, 教授 (20760573)
Co-Investigator(Kenkyū-buntansha)	Khan Emtiyaz 国立研究開発法人理化学研究所, 革新知能統合研究センター, チームリーダー (30858022) 大島聡史名古屋大学, 情報基盤センター, 准教授 (40570081) 伊田明弘国立研究開発法人海洋研究開発機構, 付加価値情報創生部門(地球情報基盤センター), 副主任研究員 (80742121)
Project Period (FY)	2020-07-30 – 2024-03-31
Keywords	階層的低ランク近似法 / 深層学習 / 行列分解 / テンソルコア
Outline of Annual Research Achievements	深層継続学習ではFisher情報行列の行列分解を用いることで性能が向上することが示されている。しかし、Fisher情報行列はパラメータ数Nの２乗の要素数を持つ密行列であるため、そのまま行列分解を直接行うことが困難である。これまでKronecker因子分解による近似を行うことでO(N^1.5)の計算量にする方法が提案されているが、本研究では階層的低ランク近似法であるH^2行列を用いることで、この計算量をO(N)に低減した。階層的低ランク近似法にはH^2行列の他にも基底を共有しないH行列や対角ブロックのみを分割するHSS行列などがあるが、H行列では行列分解の際に処理の依存関係のために並列化効率が低下し、HSS行列では非対角ブロックのランクが増大するため、H^2行列に比べて高い性能を得ることが難しい。HSS行列の既存研究ではULV分解を用いることで処理の依存関係を解消し、全ての対角ブロックを並列に処理する手法が提案されている。しかし、H^2行列にULV分解を適用するとfill-inブロックの再圧縮の際に共有基底の更新が必要になり、H行列と同様の依存関係の問題が生じる。本研究では、fill-inブロックを予め計算し共有基底に含めてULV分解を行うことでHSS行列のように全ての対角ブロックを並列に処理する手法を提案した。また、これをマルチGPU環境で高速に動作するように実装し、前進後退代入もブロック間の依存関係なく処理できる手法を開発した。さらに、これをLDL分解に拡張し密行列のk番目の固有値を二分探索によってO(NlogN)の計算量で求める手法を開発した。さらに、テンソルコアのような低精度演算器でも悪条件の行列の分解ができるよう、精度を補正する手法を開発した。

Research Products
(11 results)

All 2024 2023 Other

All Int'l Joint Research (1 results) Journal Article (7 results) (of which Int'l Joint Research: 2 results, Peer Reviewed: 7 results) Presentation (3 results) (of which Int'l Joint Research: 3 results)

[Int'l Joint Research] University of Tennessee(米国)
- Country Name
  U.S.A.
- Counterpart Institution
  University of Tennessee
[Journal Article] An Inherently Parallel H^2-ULV Factorization for Solving Dense Linear Systems on GPUs2024
- Author(s)
  Qianxiang Ma, Rio Yokota
- Journal Title
  
  International Journal of High Performance Computing Applications
  
  Volume: N/A Pages: 1-10
- Peer Reviewed
[Journal Article] DGEMM on Integer Matrix Multiplication Unit2024
- Author(s)
  Hiroyuki Ootomo, Katsuhisa Ozaki, Rio Yokota
- Journal Title
  
  International Journal of High Performance Computing Applications
  
  Volume: N/A Pages: 1-10
- Peer Reviewed
[Journal Article] Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors2023
- Author(s)
  Deshmukh Sameer、Yokota Rio、Bosilca George
- Journal Title
  
  ACM Transactions on Mathematical Software
  
  Volume: 49 Pages: 1～29
- DOI
  10.1145/3595178
- Peer Reviewed / Int'l Joint Research
[Journal Article] Parallel QR Factorization of Block Low-rank Matrices2023
- Author(s)
  Apriansyah M. Ridwan、Yokota Rio
- Journal Title
  
  ACM Transactions on Mathematical Software
  
  Volume: 48 Pages: 1～28
- DOI
  10.1145/3538647
- Peer Reviewed
[Journal Article] Computing the k-th Eigenvalue of Symmetric H2-Matrices2023
- Author(s)
  Apriansyah M. Ridwan、Yokota Rio
- Journal Title
  
  International Conference on Parallel Processing (ICPP)
  
  Volume: N/A Pages: 1-10
- DOI
  10.1145/3605573.3605607
- Peer Reviewed
[Journal Article] O(N) distributed direct factorization of structured dense matrices using runtime systems.2023
- Author(s)
  Deshmukh Sameer、Yokota Rio、Bosilca George、Ma Qinxiang
- Journal Title
  
  International Conference on Parallel Processing (ICPP)
  
  Volume: N/A Pages: 1-10
- DOI
  10.1145/3605573.3605606
- Peer Reviewed / Int'l Joint Research
[Journal Article] Mixed-Precision Random Projection for RandNLA on Tensor Cores2023
- Author(s)
  Ootomo Hiroyuki、Yokota Rio
- Journal Title
  
  Platform for Advanced Scientific Computing (PASC)
  
  Volume: N/A Pages: 1-10
- DOI
  10.1145/3592979.3593413
- Peer Reviewed
[Presentation] Computing the k-th Eigenvalue of Symmetric H2-Matrices2023
- Author(s)
  Apriansyah M. Ridwan、Yokota Rio
- Organizer
  International Conference on Parallel Processing (ICPP)
- Int'l Joint Research
[Presentation] O(N) distributed direct factorization of structured dense matrices using runtime systems.2023
- Author(s)
  Deshmukh Sameer、Yokota Rio、Bosilca George、Ma Qinxiang
- Organizer
  International Conference on Parallel Processing (ICPP)
- Int'l Joint Research
[Presentation] Mixed-Precision Random Projection for RandNLA on Tensor Cores2023
- Author(s)
  Ootomo Hiroyuki、Yokota Rio
- Organizer
  Platform for Advanced Scientific Computing (PASC)
- Int'l Joint Research

2023 Fiscal Year Annual Research Report

Application of Unconventional Linear Algebra Techniques to Continuous Learning in Supergiant Neural Networks

Principal Investigator

横田 理央 東京工業大学, 学術国際情報センター, 教授 (20760573)

Research Products

[Int'l Joint Research] University of Tennessee(米国)

Country Name

Counterpart Institution

[Journal Article] An Inherently Parallel H^2-ULV Factorization for Solving Dense Linear Systems on GPUs2024

Author(s)

Journal Title

[Journal Article] DGEMM on Integer Matrix Multiplication Unit2024

Author(s)

Journal Title

[Journal Article] Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors2023

Author(s)

Journal Title

DOI

[Journal Article] Parallel QR Factorization of Block Low-rank Matrices2023

Author(s)

Journal Title

DOI

[Journal Article] Computing the k-th Eigenvalue of Symmetric H2-Matrices2023

Author(s)

Journal Title

DOI

[Journal Article] O(N) distributed direct factorization of structured dense matrices using runtime systems.2023

Author(s)

Journal Title

DOI

[Journal Article] Mixed-Precision Random Projection for RandNLA on Tensor Cores2023

Author(s)

Journal Title

DOI

[Presentation] Computing the k-th Eigenvalue of Symmetric H2-Matrices2023

Author(s)

Organizer

[Presentation] O(N) distributed direct factorization of structured dense matrices using runtime systems.2023

Author(s)

Organizer

[Presentation] Mixed-Precision Random Projection for RandNLA on Tensor Cores2023

Author(s)

Organizer

横田理央東京工業大学, 学術国際情報センター, 教授 (20760573)