2020 Fiscal Year Annual Research Report

深層学習の精度を考慮した自動性能最適化フレームワークの構築

Research Project

Project/Area Number	18J22858
Research Institution	Tokyo Institute of Technology
Principal Investigator	大山洋介東京工業大学, 情報理工学院, 特別研究員(DC1)
Project Period (FY)	2018-04-25 – 2021-03-31
Keywords	深層学習
Outline of Annual Research Achievements	前年度の米国LLNLの研究グループとの共同研究を1)複数モデルを用いた性能評価，2)先行研究との性能比較，3)性能モデルを用いた最適な並列化戦略の予測について発展させた． 1)については前年度に使用したCosmoFlowネットワークの他にセグメンテーションに用いられる3D U-Netを用いて性能評価を行い，どちらのモデルについてもLassenスパコンのほぼ全系である2048 GPU程度までスケールできることを明らかにした．2)については提案手法と同様のハイブリッド並列実装の性能を比較し，提案手法のフレームワークレベルでの並列化が通信時間の隠蔽に大きく寄与していることを示した．3)については富岳スパコンのような超大規模並列環境を想定し，前年度までに行った一次元分割手法の性能モデリングを発展させて多次元方向に分割した場合の性能予測を行った結果，ハイブリッド並列学習ではプロセッサあたりのバッチサイズが小さいためにシンプルな一次元分割でも非常に効果的に並列化できることが判明した．これらの結果により大規模並列環境における深層学習アプリのスケーラビリティの維持について展望を示した．この研究成果はIEEE TPDSジャーナルに投稿し採択された．また，前年度までの研究成果をまとめて博士論文として執筆した．論文では畳み込みニューラルネットワークのGPU内・GPU間の並列性を性能モデリングにより最適化し，与えられたモデルに対して最適な並列化手法・計算アルゴリズムを提供する手法を提案した．これにより，まったく未知のモデルに対してもユーザが手動でチューニングを行うことなく最適な計算手法を予測することを可能とした．また，推論精度に関しては，各計算カーネルのチューニングや同並列数あたりのミニバッチサイズを大幅に下げるハイブリッド並列化を採用することでリスクなく学習の高速化を行う手法を提案した．
Research Progress Status	令和2年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	令和2年度が最終年度であるため、記入しない。

Research Products

(4 results)

All 2021 Other

All Int'l Joint Research (1 results) Journal Article (1 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 1 results) Presentation (1 results) (of which Int'l Joint Research: 1 results) Remarks (1 results)

[Int'l Joint Research] Lawrence Livermore National Laboratory(米国)
- Country Name
  U.S.A.
- Counterpart Institution
  Lawrence Livermore National Laboratory
[Journal Article] The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism2021
- Author(s)
  Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Erin McCarthy, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Peter Nugent, Brian Van Essen
- Journal Title
  
  IEEE Transactions on Parallel & Distributed Systems (TPDS)
  
  Volume: 32 Pages: 1641-1652
- Peer Reviewed / Int'l Joint Research
[Presentation] Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?2021
- Author(s)
  Jens Domke, Emil Vatai, Alexsandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka
- Organizer
  The International Parallel and Distributed Processing Symposium (IPDPS 2021)
- Int'l Joint Research
[Remarks] Yosuke Oyama
- URL
  https://oyamay.github.io/

2020 Fiscal Year Annual Research Report

深層学習の精度を考慮した自動性能最適化フレームワークの構築

Principal Investigator

大山 洋介 東京工業大学, 情報理工学院, 特別研究員(DC1)

Research Products

[Int'l Joint Research] Lawrence Livermore National Laboratory(米国)

Country Name

Counterpart Institution

[Journal Article] The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism2021

Author(s)

Journal Title

[Presentation] Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?2021

Author(s)

Organizer

[Remarks] Yosuke Oyama

URL

大山洋介東京工業大学, 情報理工学院, 特別研究員(DC1)