2022 年度実施状況報告書

Scalable Hybrid-parallelism Design for Mega-Size Deep Learning Model

研究課題

研究課題/領域番号	21K17751
研究機関	国立研究開発法人産業技術総合研究所
研究代表者	Nguyen Truong 国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 研究員 (60835346)
研究期間 (年度)	2021-04-01 – 2024-03-31
キーワード	Deep Learning / Large Scale / Distributed Computing / Non-IID
研究実績の概要	This year, we develop new methods to reduce the computing time by eliminating non-important samples during the training process (submitted to ICML2023). Through our previous work (IPDPS2022), we found that local shuffling could not achieve good accuracy in large-scale training due to non-iid data and overfitting issues. We deal with non-iid by assigning the impact factor for the models from different workers dynamically and use knowledge distillation for dealing with overfitting. The work is the Best Paper Award Finalist in CCGRID2023. We study the method to reduce the communication time by a co-design of collective communication algorithm and the intra-node network architecture (a Q1-journal JPDC is accepted) and the inter-node network architecture (poster at HPCA-Asia2023).
現在までの達成度 (区分)	現在までの達成度 (区分) 1: 当初の計画以上に進展している理由 We enlarge our international collaborative research with Telecom SudParis France, Hanoi University of Science and Technology (HUST) Vietnam, and VinUni-Illinois Smart Health Center VinUniversity Vietnam. The CCGRID2023 paper (PI is the corresponding author) is selected as one of the best paper award finalist papers (top 4 over 58 accepted papers, over 275 submitted papers). In the ICML2023 paper, empirical results on various large-scale datasets and models used directly in image classification and segmentation show that while the with-replacement importance sampling algorithm performs poorly on large datasets, our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
今後の研究の推進方策	We continue to investigate (1) the extension of work on I/O to reduce the overhead of partial local shuffling at scale, and (2) the extension of the methods to reduce the computing time by eliminating non-important samples during the training process. We also study (3) the method to reduce communication time by applying the overlapping of communication and computation.
次年度使用額が生じた理由	In the next fiscal year, we will conduct a wide range of large-scale experiment on supercomputer system. We will pay for using ABCI supercomputer

研究成果
(3件)

すべて 2023

すべて雑誌論文 (2件) (うち国際共著 2件、査読あり 2件) 学会発表 (1件) (うち国際学会 1件)

[雑誌論文] Effective Switchless Inter-FPGA Memory Networks2023
- 著者名/発表者名
  Truong Thao Nguyen, Kien Trung Pham, Hiroshi Yamaguchi, Yutaka Urino, Michihiro Koibuchi
- 雑誌名
  
  Journal of Parallel and Distributed Computing
  
  巻: - ページ: -
- 査読あり / 国際共著
[雑誌論文] CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with Clustered Aggregation and Knowledge Distilled Regularization2023
- 著者名/発表者名
  Nang Hung Nguyen, Duc Long Nguyen, Trong Bang Nguyen, Thanh-Hung Nguyen, Hieu Pham, Truong Thao Nguyen, Phi Le Nguyen
- 雑誌名
  
  23rd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing
  
  巻: - ページ: 249-261
- 査読あり / 国際共著
[学会発表] Efficient Allreduce Algorithm for Large-Scale Deep Learning on Distributed Loop Networks2023
- 著者名/発表者名
  Truong Thao Nguyen, Peng Chen, Yusuke Tanimura
- 学会等名
  International Conference on High Performance Computing in Asia-Pacific Region 2023
- 国際学会

2022 年度 実施状況報告書

Scalable Hybrid-parallelism Design for Mega-Size Deep Learning Model

研究代表者

Nguyen Truong 国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 研究員 (60835346)

現在までの達成度 (区分)

理由

研究成果

[雑誌論文] Effective Switchless Inter-FPGA Memory Networks2023

著者名/発表者名

雑誌名

[雑誌論文] CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with Clustered Aggregation and Knowledge Distilled Regularization2023

著者名/発表者名

雑誌名

[学会発表] Efficient Allreduce Algorithm for Large-Scale Deep Learning on Distributed Loop Networks2023

著者名/発表者名

学会等名

2022 年度実施状況報告書