2023 年度実績報告書

Scalable Hybrid-parallelism Design for Mega-Size Deep Learning Model

研究課題

研究課題/領域番号	21K17751
研究機関	国立研究開発法人産業技術総合研究所
研究代表者	Nguyen Truong 国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 研究員 (60835346)
研究期間 (年度)	2021-04-01 – 2024-03-31
キーワード	Distributed Training / Large Model / Large dataset / Large scale system
研究実績の概要	We found that 3D parallelism (data+pipeline+model) becomes standard in training large-scale Deep Learning with large datasets. We proposed the methods to speed up this training process: + To reduce the I/O time, we use local shuffling (IPDPS22a paper) along with pair-wise data exchanging (CCGRID23-Best Paper Candidate, HPCAsia24) and model exchanging (CANDAR23-Best Paper ward) to maintain the accuracy of the model. + To reduce the computing time, we eliminate non-important samples during the training (Neurips23). + We reduce the communication time by co-design network architecture and collective communication (IPDPS22b, HPCAsia23, JPDC23, CCGRID24). We also deal with memory capacity limitation by separating the big model into multiple smaller parts and only assembling it at the end (TNSM23).

研究成果
(5件)

すべて 2024 2023

すべて雑誌論文 (4件) (うち国際共著 4件、査読あり 4件、オープンアクセス 2件) 学会発表 (1件) (うち国際学会 1件)

[雑誌論文] KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training2024
- 著者名/発表者名
  Truong Thao Nguyen, Balazs Gerofi, Edgar Josafat Martinez-Noriega, Francois Trahay, and Mohamed Wahib
- 雑誌名
  
  37th Conference on Neural Information Processing Systems (NeurIPS 2023)
  
  巻: - ページ: 1-23
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] FedDCT: Federated Learning of Large Convolutional Neural Networks on Resource-Constrained Devices Using Divide and Collaborative Training2024
- 著者名/発表者名
  Nguyen Quan、Pham Hieu H.、Wong Kok-Seng、Le Nguyen Phi、Nguyen Truong Thao、Do Minh N.
- 雑誌名
  
  IEEE Transactions on Network and Service Management
  
  巻: 21 ページ: 418～436
- DOI
  10.1109/TNSM.2023.3314066
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] A Bandwidth-Optimal All-to-All Communication in Two-Dimensional Fully Connected Network2024
- 著者名/発表者名
  Kien Trung Pham, Thao Nguyen Truong and Michihiro Koibuchi
- 雑誌名
  
  24th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing
  
  巻: - ページ: 1-7
- DOI
  10.1109/CCGrid59990.2024.00010
- 査読あり / 国際共著
[雑誌論文] SEM: A Simple Yet Efficient Model-agnostic Local Training Mechanism to Tackle Data Sparsity and Scarcity in Federated Learning2023
- 著者名/発表者名
  Pham Quang Ha、Nguyen Nang Hung、Nguyen Thanh Hung、Pham Huy Hieu、Nguyen Phi Le、Nguyen Truong Thao
- 雑誌名
  
  Eleventh International Symposium on Computing and Networking (CANDAR)
  
  巻: - ページ: 120-126
- DOI
  10.1109/CANDAR60563.2023.00023
- 査読あり / 国際共著
[学会発表] Efficient Sample Exchanging for Large-Scale Training Distributed Deep Learning with Local Sampling2024
- 著者名/発表者名
  Truong Thao Nguyen, Yusuke Tanimura
- 学会等名
  International Conference on High Performance Computing in Asia-Pacific Region 2024
- 国際学会

2023 年度 実績報告書

Scalable Hybrid-parallelism Design for Mega-Size Deep Learning Model

研究代表者

Nguyen Truong 国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 研究員 (60835346)

研究成果

[雑誌論文] KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training2024

著者名/発表者名

雑誌名

[雑誌論文] FedDCT: Federated Learning of Large Convolutional Neural Networks on Resource-Constrained Devices Using Divide and Collaborative Training2024

著者名/発表者名

雑誌名

DOI

[雑誌論文] A Bandwidth-Optimal All-to-All Communication in Two-Dimensional Fully Connected Network2024

著者名/発表者名

雑誌名

DOI

[雑誌論文] SEM: A Simple Yet Efficient Model-agnostic Local Training Mechanism to Tackle Data Sparsity and Scarcity in Federated Learning2023

著者名/発表者名

雑誌名

DOI

[学会発表] Efficient Sample Exchanging for Large-Scale Training Distributed Deep Learning with Local Sampling2024

著者名/発表者名

学会等名

2023 年度実績報告書