深層学習の精度を考慮した自動性能最適化フレームワークの構築

Research Project

Project/Area Number	18J22858
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	国内
Research Field	High performance computing
Research Institution	Tokyo Institute of Technology
Principal Investigator	大山洋介東京工業大学, 情報理工学院, 特別研究員(DC1)
Project Period (FY)	2018-04-25 – 2021-03-31
Project Status	Completed (Fiscal Year 2020)
Budget Amount *help	¥2,200,000 (Direct Cost: ¥2,200,000) Fiscal Year 2020: ¥700,000 (Direct Cost: ¥700,000) Fiscal Year 2019: ¥700,000 (Direct Cost: ¥700,000) Fiscal Year 2018: ¥800,000 (Direct Cost: ¥800,000)
Keywords	深層学習
Outline of Annual Research Achievements	前年度の米国LLNLの研究グループとの共同研究を1)複数モデルを用いた性能評価，2)先行研究との性能比較，3)性能モデルを用いた最適な並列化戦略の予測について発展させた． 1)については前年度に使用したCosmoFlowネットワークの他にセグメンテーションに用いられる3D U-Netを用いて性能評価を行い，どちらのモデルについてもLassenスパコンのほぼ全系である2048 GPU程度までスケールできることを明らかにした．2)については提案手法と同様のハイブリッド並列実装の性能を比較し，提案手法のフレームワークレベルでの並列化が通信時間の隠蔽に大きく寄与していることを示した．3)については富岳スパコンのような超大規模並列環境を想定し，前年度までに行った一次元分割手法の性能モデリングを発展させて多次元方向に分割した場合の性能予測を行った結果，ハイブリッド並列学習ではプロセッサあたりのバッチサイズが小さいためにシンプルな一次元分割でも非常に効果的に並列化できることが判明した．これらの結果により大規模並列環境における深層学習アプリのスケーラビリティの維持について展望を示した．この研究成果はIEEE TPDSジャーナルに投稿し採択された．また，前年度までの研究成果をまとめて博士論文として執筆した．論文では畳み込みニューラルネットワークのGPU内・GPU間の並列性を性能モデリングにより最適化し，与えられたモデルに対して最適な並列化手法・計算アルゴリズムを提供する手法を提案した．これにより，まったく未知のモデルに対してもユーザが手動でチューニングを行うことなく最適な計算手法を予測することを可能とした．また，推論精度に関しては，各計算カーネルのチューニングや同並列数あたりのミニバッチサイズを大幅に下げるハイブリッド並列化を採用することでリスクなく学習の高速化を行う手法を提案した．
Research Progress Status	令和2年度が最終年度であるため、記入しない。
Strategy for Future Research Activity	令和2年度が最終年度であるため、記入しない。

Report

(3 results)

Research Products
(18 results)

All 2021 2019 2018 Other

All Int'l Joint Research (4 results) Journal Article (1 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 1 results) Presentation (12 results) (of which Int'l Joint Research: 5 results, Invited: 1 results) Remarks (1 results)

[Int'l Joint Research] Lawrence Livermore National Laboratory(米国)
- Related Report
  2020 Annual Research Report
[Int'l Joint Research] Lawrence Livermore National Laboratory/Lawrence Berkeley National Laboratory(米国)
- Related Report
  2019 Annual Research Report
[Int'l Joint Research] ETH Zurich(スイス)
- Related Report
  2018 Annual Research Report
[Int'l Joint Research] Lawrence Livermore National Laboratory/Lawrence Berkeley National Laboratory/University of Illinois(米国)
- Related Report
  2018 Annual Research Report
[Journal Article] The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism2021
- Author(s)
  Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Erin McCarthy, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Peter Nugent, Brian Van Essen
- Journal Title
  
  IEEE Transactions on Parallel & Distributed Systems (TPDS)
  
  Volume: 32 Pages: 1641-1652
- Related Report
  2020 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Presentation] Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?2021
- Author(s)
  Jens Domke, Emil Vatai, Alexsandr Drozd, Peng Chen, Yosuke Oyama, Lingqi Zhang, Shweta Salaria, Daichi Mukunoki, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka
- Organizer
  The International Parallel and Distributed Processing Symposium (IPDPS 2021)
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Toward Training a Large 3D Cosmological CNN with Hybrid Parallelization2019
- Author(s)
  Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Marc Snir, Peter Nugent, Brian Van Essen
- Organizer
  並列/分散/協調処理に関するサマーワークショップ (SWoPP2019)
- Related Report
  2019 Annual Research Report
[Presentation] メモリアクセスデータを用いた機械学習によるアプリケーションの類型化2019
- Author(s)
  土川稔生，遠藤敏夫，大山洋介，野村哲弘，近藤正章，松岡聡
- Organizer
  並列/分散/協調処理に関するサマーワークショップ (SWoPP2019)
- Related Report
  2019 Annual Research Report
[Presentation] Toward Training a Large 3D Cosmological CNN with Hybrid Parallelization2019
- Author(s)
  Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Marc Snir, Peter Nugent, Brian Van Essen
- Organizer
  The 1st Workshop on Parallel and Distributed Machine Learning 2019 (PDML’19), in 48th International Conference on Parallel Processing (ICPP 2019)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research / Invited
[Presentation] Toward Training a Large 3D Cosmological CNN with Hybrid Parallelization2019
- Author(s)
  Yosuke Oyama, Naoya Maruyama, Nikoli Dryden, Peter Harrington, Jan Balewski, Satoshi Matsuoka, Marc Snir, Peter Nugent, Brian Van Essen
- Organizer
  48th International Conference on Parallel Processing (ICPP 2019)
- Related Report
  2019 Annual Research Report
- Int'l Joint Research
[Presentation] u-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batches2019
- Author(s)
  Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka
- Organizer
  GPU Technology Conference 2019 (GTC 2019)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] Accelerating Deep Learning Frameworks with Micro-batches2018
- Author(s)
  Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka
- Organizer
  IEEE Cluster 2018
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] 大規模並列環境における少精度型を用いたディープラーニングの学習精度の検証2018
- Author(s)
  大山洋介
- Organizer
  JHPCN：学際大規模情報基盤共同利用・共同研究拠点第10回シンポジウム
- Related Report
  2018 Annual Research Report
[Presentation] 機械学習による計算機トレースの自動生成2018
- Author(s)
  土川稔生, 大山洋介, 野村哲弘, 松岡聡
- Organizer
  並列/分散/協調処理に関するサマーワークショップ (SWoPP2018)
- Related Report
  2018 Annual Research Report
[Presentation] 深層学習におけるBatchNormalization使用時の計算時間と精度の関係性2018
- Author(s)
  八島慶汰, 大山洋介, 松岡聡
- Organizer
  並列/分散/協調処理に関するサマーワークショップ (SWoPP2018)
- Related Report
  2018 Annual Research Report
[Presentation] 大規模並列環境における低精度型を用いたディープラーニングの学習精度の検証2018
- Author(s)
  大山洋介, 野村哲弘, 佐藤育郎, 松岡聡
- Organizer
  公開シンポジウム「Co-Designによる深層学習基盤」
- Related Report
  2018 Annual Research Report
[Presentation] μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching2018
- Author(s)
  Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka
- Organizer
  公開シンポジウム「Co-Designによる深層学習基盤」
- Related Report
  2018 Annual Research Report
[Remarks] Yosuke Oyama
- URL
  https://oyamay.github.io/
- Related Report
  2020 Annual Research Report 2019 Annual Research Report 2018 Annual Research Report

深層学習の精度を考慮した自動性能最適化フレームワークの構築

Principal Investigator

大山 洋介 東京工業大学, 情報理工学院, 特別研究員(DC1)

¥2,200,000 (Direct Cost: ¥2,200,000)

Report

Research Products

[Int'l Joint Research] Lawrence Livermore National Laboratory(米国)

Related Report

[Int'l Joint Research] Lawrence Livermore National Laboratory/Lawrence Berkeley National Laboratory(米国)

Related Report

[Int'l Joint Research] ETH Zurich(スイス)

Related Report

[Int'l Joint Research] Lawrence Livermore National Laboratory/Lawrence Berkeley National Laboratory/University of Illinois(米国)

Related Report

[Journal Article] The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism2021

Author(s)

Journal Title

Related Report

[Presentation] Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?2021

Author(s)

Organizer

Related Report

[Presentation] Toward Training a Large 3D Cosmological CNN with Hybrid Parallelization2019

Author(s)

Organizer

Related Report

[Presentation] メモリアクセスデータを用いた機械学習によるアプリケーションの類型化2019

Author(s)

Organizer

Related Report

[Presentation] Toward Training a Large 3D Cosmological CNN with Hybrid Parallelization2019

Author(s)

Organizer

Related Report

[Presentation] Toward Training a Large 3D Cosmological CNN with Hybrid Parallelization2019

Author(s)

Organizer

Related Report

[Presentation] u-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batches2019

Author(s)

Organizer

Related Report

[Presentation] Accelerating Deep Learning Frameworks with Micro-batches2018

Author(s)

Organizer

Related Report

[Presentation] 大規模並列環境における少精度型を用いたディープラーニングの学習精度の検証2018

Author(s)

Organizer

Related Report

[Presentation] 機械学習による計算機トレースの自動生成2018

Author(s)

Organizer

Related Report

[Presentation] 深層学習におけるBatchNormalization使用時の計算時間と精度の関係性2018

Author(s)

Organizer

Related Report

[Presentation] 大規模並列環境における低精度型を用いたディープラーニングの学習精度の検証2018

Author(s)

Organizer

Related Report

[Presentation] μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching2018

Author(s)

Organizer

Related Report

[Remarks] Yosuke Oyama

URL

Related Report

大山洋介東京工業大学, 情報理工学院, 特別研究員(DC1)