Application of Unconventional Linear Algebra Techniques to Continuous Learning in Supergiant Neural Networks

Research Project

Project/Area Number	20K20624
Research Category	Grant-in-Aid for Challenging Research (Pioneering)
Allocation Type	Multi-year Fund
Review Section	Medium-sized Section 60:Information science, computer engineering, and related fields
Research Institution	Tokyo Institute of Technology
Principal Investigator	Yokota Rio 東京工業大学, 学術国際情報センター, 教授 (20760573)
Co-Investigator(Kenkyū-buntansha)	Khan Emtiyaz 国立研究開発法人理化学研究所, 革新知能統合研究センター, チームリーダー (30858022) 大島聡史名古屋大学, 情報基盤センター, 准教授 (40570081) 伊田明弘国立研究開発法人海洋研究開発機構, 付加価値情報創生部門(地球情報基盤センター), 副主任研究員 (80742121)
Project Period (FY)	2020-07-30 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥25,350,000 (Direct Cost: ¥19,500,000、Indirect Cost: ¥5,850,000) Fiscal Year 2022: ¥7,280,000 (Direct Cost: ¥5,600,000、Indirect Cost: ¥1,680,000) Fiscal Year 2021: ¥7,410,000 (Direct Cost: ¥5,700,000、Indirect Cost: ¥1,710,000) Fiscal Year 2020: ¥10,660,000 (Direct Cost: ¥8,200,000、Indirect Cost: ¥2,460,000)
Keywords	階層的低ランク近似法 / 深層学習 / 行列分解 / テンソルコア / 密行列の高速解法 / 階層的低ランク近似 / H行列 / LU分解 / 2次最適化 / 継続学習 / クロネッカー因子分解 / ２次最適化 / 分散深層学習 / 線形代数ライブラリ / GPU
Outline of Research at the Start	近年の深層学習は個々のタスクに特化した小規模なモデルを皆が冗長に学習するのではなく、大規模なモデルを用いて様々なタスクを一元的かつ継続的に学習する方向に向かっている。しかし、国内のAI分野の研究には、GAFAなどの膨大なデータ、計算資源、人的資源を持つ企業と同じ土俵で競争しようとするものは少ない。本研究は、これらの企業との超高精度・超大型のDNNを学習する競争に果敢に挑み、世界最大の複数のスパコンを利用できる恵まれた計算環境と、二次最適化の分散並列実装という独自技術を用いて画像処理や自然言語処理などの幅広いタスクで国際的に優位性を示すことを目指している。
Outline of Final Research Achievements	It has been shown that using matrix factorization of the Fisher information matrix improves the performance of continual deep learning. However, it is difficult to perform matrix factorization directly on the Fisher information matrix because it is a dense matrix where the number of elements grows with the square of the number of parameters N. In this study, we use the H^2 matrix, which is a hierarchical low-rank approximation method that can reduce computational complexity to O(N). Furthermore, we proposed a method to process all diagonal blocks in parallel by performing ULV decomposition with fill-in blocks pre-computed and included in the shared basis. We also developed a method for recovering the numerical accuracy when using low-precision arithmetic units such as tensor cores, which allows us to factorize ill-conditioned matrices.
Academic Significance and Societal Importance of the Research Achievements	Fisher情報行列は継続学習やモデル・マージング、連合学習を行う際に有用であることが知られているが、その計算コストは膨大でありモデルの規模が近年急激に増大していることからも、その計算を高速化する手法が求められている。これまでKronecker因子分解による近似を行うことでO(N^1.5)の計算量にする方法が提案されているが、本研究ではこれをO(N)にまで低減できたことは意義深い。これにより、継続学習、モデル・マージング、連合学習の研究が加速すれば、一部の限られた大企業の専売特許となっている大規模な生成モデルの構築が、より多くの研究者の共同作業によって分担して構築できるようになる。

Report

(5 results)

2023 Annual Research Report Final Research Report ( PDF )
2022 Research-status Report
2021 Research-status Report
2020 Research-status Report

Research Products
(24 results)

All 2024 2023 2022 Other

All Int'l Joint Research (2 results) Journal Article (9 results) (of which Int'l Joint Research: 2 results, Peer Reviewed: 9 results) Presentation (13 results) (of which Int'l Joint Research: 11 results)

[Int'l Joint Research] University of Tennessee(米国)
- Related Report
  2023 Annual Research Report
[Int'l Joint Research] University of Tennessee at Knoxville(米国)
- Related Report
  2022 Research-status Report
[Journal Article] An Inherently Parallel H^2-ULV Factorization for Solving Dense Linear Systems on GPUs2024
- Author(s)
  Qianxiang Ma, Rio Yokota
- Journal Title
  
  International Journal of High Performance Computing Applications
  
  Volume: N/A Pages: 1-10
- Related Report
  2023 Annual Research Report
- Peer Reviewed
[Journal Article] DGEMM on Integer Matrix Multiplication Unit2024
- Author(s)
  Hiroyuki Ootomo, Katsuhisa Ozaki, Rio Yokota
- Journal Title
  
  International Journal of High Performance Computing Applications
  
  Volume: N/A Pages: 1-10
- Related Report
  2023 Annual Research Report
- Peer Reviewed
[Journal Article] Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors2023
- Author(s)
  Deshmukh Sameer、Yokota Rio、Bosilca George
- Journal Title
  
  ACM Transactions on Mathematical Software
  
  Volume: 49 Issue: 3 Pages: 1-29
- DOI
  10.1145/3595178
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Computing the k-th Eigenvalue of Symmetric H2-Matrices2023
- Author(s)
  Apriansyah M. Ridwan、Yokota Rio
- Journal Title
  
  International Conference on Parallel Processing (ICPP)
  
  Volume: N/A Pages: 1-10
- DOI
  10.1145/3605573.3605607
- Related Report
  2023 Annual Research Report
- Peer Reviewed
[Journal Article] O(N) distributed direct factorization of structured dense matrices using runtime systems.2023
- Author(s)
  Deshmukh Sameer、Yokota Rio、Bosilca George、Ma Qinxiang
- Journal Title
  
  International Conference on Parallel Processing (ICPP)
  
  Volume: N/A Pages: 1-10
- DOI
  10.1145/3605573.3605606
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Mixed-Precision Random Projection for RandNLA on Tensor Cores2023
- Author(s)
  Ootomo Hiroyuki、Yokota Rio
- Journal Title
  
  Platform for Advanced Scientific Computing (PASC)
  
  Volume: N/A Pages: 1-10
- DOI
  10.1145/3592979.3593413
- Related Report
  2023 Annual Research Report
- Peer Reviewed
[Journal Article] Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors2023
- Author(s)
  Sameer Deshmukh, Rio Yokota, George Bosilca
- Journal Title
  
  ACM Transactions on Mathematical Software
  
  Volume: 未定
- Related Report
  2022 Research-status Report
- Peer Reviewed
[Journal Article] Parallel QR Factorization of Block Low-Rank Matrices2022
- Author(s)
  Muhammad Ridwan Apriansyah, Rio Yokota
- Journal Title
  
  ACM Transactions on Mathematical Software
  
  Volume: 48(3) Issue: 3 Pages: 1-28
- DOI
  10.1145/3538647
- Related Report
  2023 Annual Research Report 2022 Research-status Report
- Peer Reviewed
[Journal Article] Recovering Single Precision Accuracy from Tensor Cores While Surpassing the FP32 Theoretical Peak Performance2022
- Author(s)
  Hiroyuki Ootomo, Rio Yokota
- Journal Title
  
  The International Journal of High Performance Computing Application
  
  Volume: 1 Pages: 1-1
- Related Report
  2021 Research-status Report
- Peer Reviewed
[Presentation] Computing the k-th Eigenvalue of Symmetric H2-Matrices2023
- Author(s)
  Apriansyah M. Ridwan、Yokota Rio
- Organizer
  International Conference on Parallel Processing (ICPP)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] O(N) distributed direct factorization of structured dense matrices using runtime systems.2023
- Author(s)
  Deshmukh Sameer、Yokota Rio、Bosilca George、Ma Qinxiang
- Organizer
  International Conference on Parallel Processing (ICPP)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Mixed-Precision Random Projection for RandNLA on Tensor Cores2023
- Author(s)
  Ootomo Hiroyuki、Yokota Rio
- Organizer
  Platform for Advanced Scientific Computing (PASC)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Mixed-Precision Random Projection for RandNLA on Tensor Cores2023
- Author(s)
  Hiroyuki Ootomo, Rio Yokota
- Organizer
  Platform for Advanced Scientific Computing (PASC)
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] O(N) Factorization of Dense Matrices on GPUs Without Trailing Submatrix Dependencies2023
- Author(s)
  Qianxiang Ma, Rio Yokota
- Organizer
  SIAM Conference on Computational Science and Engineering (CSE)
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] Parallel QR Factorization of Block Low-Rank Matrices2023
- Author(s)
  Muhammad Ridwan Apriansyah, Rio Yokota
- Organizer
  SIAM Conference on Computational Science and Engineering (CSE)
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] QR Factorization of Block Low-Rank Matrices on Multi-Instance GPU2022
- Author(s)
  Satoshi Ohshima, Akihiro Ida, Rio Yokota and Ichitaro Yamazaki
- Organizer
  The 23rd International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT’22)
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] Scalable Linear Time Dense Direct Solver for 3-D Problems Without Trailing Sub-Matrix Dependencies2022
- Author(s)
  Qianxiang Ma, Sameer Deshmukh, Rio Yokota
- Organizer
  The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22)
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] Acceleration of O(N) Solvers for Large Dense Matrices2022
- Author(s)
  Sameer Deshmukh
- Organizer
  Conference on Advance Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2022)
- Related Report
  2021 Research-status Report
- Int'l Joint Research
[Presentation] Parallel QR Factorization of Block Low-rank Matrices2022
- Author(s)
  Muhammad Ridwan Apriansyah
- Organizer
  Conference on Advance Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2022)
- Related Report
  2021 Research-status Report
- Int'l Joint Research
[Presentation] Iterative Refinement with Hierarchical Low-rank Preconditioners Using Mixed Precision2022
- Author(s)
  Thomas Spendlhofer
- Organizer
  Conference on Advance Topics and Auto Tuning in High-Performance Scientific Computing (ATAT2022)
- Related Report
  2021 Research-status Report
- Int'l Joint Research
[Presentation] 深層学習における2次最適化の汎化性能の検証2022
- Author(s)
  石井央，横田理央
- Organizer
  第84回情報処理学会全国大会
- Related Report
  2021 Research-status Report
[Presentation] Vision Transformerにおけるバッチサイズの汎化性能への影響2022
- Author(s)
  中村秋海，横田理央
- Organizer
  第84回情報処理学会全国大会
- Related Report
  2021 Research-status Report

Application of Unconventional Linear Algebra Techniques to Continuous Learning in Supergiant Neural Networks

Principal Investigator

Yokota Rio 東京工業大学, 学術国際情報センター, 教授 (20760573)

¥25,350,000 (Direct Cost: ¥19,500,000、Indirect Cost: ¥5,850,000)

Report

Research Products

[Int'l Joint Research] University of Tennessee(米国)

Related Report

[Int'l Joint Research] University of Tennessee at Knoxville(米国)

Related Report

[Journal Article] An Inherently Parallel H^2-ULV Factorization for Solving Dense Linear Systems on GPUs2024

Author(s)

Journal Title

Related Report

[Journal Article] DGEMM on Integer Matrix Multiplication Unit2024

Author(s)

Journal Title

Related Report

[Journal Article] Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Computing the k-th Eigenvalue of Symmetric H2-Matrices2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] O(N) distributed direct factorization of structured dense matrices using runtime systems.2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Mixed-Precision Random Projection for RandNLA on Tensor Cores2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors2023

Author(s)

Journal Title

Related Report

[Journal Article] Parallel QR Factorization of Block Low-Rank Matrices2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Recovering Single Precision Accuracy from Tensor Cores While Surpassing the FP32 Theoretical Peak Performance2022

Author(s)

Journal Title

Related Report

[Presentation] Computing the k-th Eigenvalue of Symmetric H2-Matrices2023

Author(s)

Organizer

Related Report

[Presentation] O(N) distributed direct factorization of structured dense matrices using runtime systems.2023

Author(s)

Organizer

Related Report

[Presentation] Mixed-Precision Random Projection for RandNLA on Tensor Cores2023

Author(s)

Organizer

Related Report

[Presentation] Mixed-Precision Random Projection for RandNLA on Tensor Cores2023

Author(s)

Organizer

Related Report

[Presentation] O(N) Factorization of Dense Matrices on GPUs Without Trailing Submatrix Dependencies2023

Author(s)

Organizer

Related Report

[Presentation] Parallel QR Factorization of Block Low-Rank Matrices2023

Author(s)

Organizer

Related Report

[Presentation] QR Factorization of Block Low-Rank Matrices on Multi-Instance GPU2022

Author(s)

Organizer

Related Report

[Presentation] Scalable Linear Time Dense Direct Solver for 3-D Problems Without Trailing Sub-Matrix Dependencies2022