分散深層学習のI/O性能最適化と次世代の人工知能クラウドへ向けた展開

Research Project

Project/Area Number	18K11332
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 60090:High performance computing-related
Research Institution	National Institute of Advanced Industrial Science and Technology
Principal Investigator	佐藤仁国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 主任研究員 (00550633)
Project Period (FY)	2018-04-01 – 2019-03-31
Project Status	Discontinued (Fiscal Year 2018)
Budget Amount *help	¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2020: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2019: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2018: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Keywords	高性能計算 / ビッグデータ / 人工知能 / 分散深層学習 / クラウドコンピューティング
Outline of Annual Research Achievements	スーパーコンピュータとクラウドが融合した大規模並列環境である人工知能クラウド上での最重要カーネルである分散深層学習を対象に，1) 実環境での精緻な性能モデリングによる分散深層学習のI/Oワークロードの性能解析，2) 細粒度なアクセスパターンに応じてI/O性能を最適化するためのメモリ・ストレージ階層の深化を考慮したデータのライフサイクル管理アルゴリズムの開発と分散I/Oフレームワークへの実装，3) 既存のデファクトな分散深層学習フレームワークへの分散I/Oフレームワークの組み込み・統合化をコデザインすることで，次世代のAIクラウド上でのメモリ・ストレージI/Oの高速化のためのシステムソフトウェアの要素技術の設計と開発を行い，更にその先のBig Data/AIのRebooting Computingのための要求要件の明確化や設計情報となるを目指す．本年度は、近い将来に登場すると見込まれる、次世代NVIDIA GPU、Intel Nervana，富士通DLUなどのアクセラレータ，ポスト「京」スパコンを目指したHPC向けのARMなどのプロセッサ，Intel Apache Pass技術による3D XPoint等の不揮発性メモリ，InfninibandやOmni-Pathなど高性能ネットワークなど最先端のコモディティデバイスを想定してメモリ・ストレージへのI/Oに関する性能モデリングを行うための基礎の性能評価として、ImageNet1Kのオープンデータセットを対象にI/Oの高速化を行った。具体的には、パラメタ設定を行うことで分散処理を行い、各プロセスが担当するデータをDRAMメモリへのキャッシュ、及びネットワークI/Oを考慮した集団通信の最適化を行うことで、大幅な性能向上を達成することを産総研AI橋渡しクラウド（ABCI)上で確認し、基礎的な性能解析を行った。

Report

(1 results)

2018 Annual Research Report

Research Products
(3 results)

All 2018

All Journal Article (1 results) Presentation (2 results) (of which Int'l Joint Research: 1 results)

[Journal Article] AI橋渡しクラウドABCIの性能評価2018
- Author(s)
  佐藤仁, 溝手竜, 滝澤真一朗
- Journal Title
  
  研究報告ハイパフォーマンスコンピューティング（HPC）
  
  Volume: 2018-HPC-166 Pages: 1-6
- Related Report
  2018 Annual Research Report
[Presentation] ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data2018
- Author(s)
  Hitoshi Sato
- Organizer
  Fourth International Workshop on Communication Architectures for HPC, Big Data, Deep Learning and Clouds at Extreme Scale In conjunction with International Supercomputing Conference (ISC 2018)
- Related Report
  2018 Annual Research Report
- Int'l Joint Research
[Presentation] AI橋渡しクラウド（ABCI）における高性能計算とAI/ビッグデータ処理の融合2018
- Author(s)
  佐藤　仁
- Organizer
  第2回 HPC OPS 研究会
- Related Report
  2018 Annual Research Report

分散深層学習のI/O性能最適化と次世代の人工知能クラウドへ向けた展開

Principal Investigator

佐藤 仁 国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 主任研究員 (00550633)

¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)

Report

Research Products

[Journal Article] AI橋渡しクラウドABCIの性能評価2018

Author(s)

Journal Title

Related Report

[Presentation] ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data2018

Author(s)

Organizer

Related Report

[Presentation] AI橋渡しクラウド（ABCI）における高性能計算とAI/ビッグデータ処理の融合2018

Author(s)

Organizer

Related Report

佐藤仁国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 主任研究員 (00550633)