Scalable Hybrid-parallelism Design for Mega-Size Deep Learning Model

Research Project

Project/Area Number	21K17751
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 60090:High performance computing-related
Research Institution	National Institute of Advanced Industrial Science and Technology
Principal Investigator	Nguyen TRUONG 国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 研究員 (60835346)
Project Period (FY)	2021-04-01 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000) Fiscal Year 2023: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2022: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000) Fiscal Year 2021: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Keywords	Distributed Training / Large Model / Large dataset / Large scale system / Deep Learning / Large Scale / Distributed Computing / Non-IID / Large-scale / Distributed computing / Hybrid parallelism
Outline of Research at the Start	This proposal try to find techniques that help to speed-up the training/inference process of Distributed Deep Learning. The proposed research project includes several research topics: (1) Hybrid-parallelism design:(1.1) Study the limitation of different parallelism strategies and (1.2) find novel fine-grained hybrid parallelism strategies for each type of specific applications (2) Method to reduce communication time via (2.1) optimizing the communication mechanism for each type of network architecture in supercomputers and (2.2)study the method to reduce network contention.
Outline of Final Research Achievements	We deal with memory capacity limitation when training a large model by separating the model into multiple smaller parts (published a Q1 journal-TNSM23). We also found that 3D parallelism (data+pipeline+tensor) becomes standard in training large-scale Deep Learning with large datasets. We proposed the methods to speed up this training process. To reduce the I/O time, we use local shuffling along with a pair-wise data exchanging and a model exchanging to maintain the accuracy of the model. We published 3 papers (IPDPS22a, CCGRID23, CANDAR23), a poster (HPCAsia24), and achieved 2 best paper awards. To reduce the computing time, we eliminate to process the non-important samples during the training (published at a A* conference - Neurips23). We reduce the communication time by co-design network architecture and collective communication. We published 2 rank A paper (IPDPS22b, CCGRID24), a Q1 journal (JPDC23) and a poster (HPCAsia23).
Academic Significance and Societal Importance of the Research Achievements	Our research helps to support the research and development of big models. It brings a groundbreaking new solution with the requirements of the urgent AI, e.g.,ChatGPT. It can be ultimately contributing to the advancement of AI models, particularly foundational models, in the context of social 5.0.

Report

(4 results)

2023 Annual Research Report Final Research Report ( PDF )
2022 Research-status Report
2021 Research-status Report

Research Products
(10 results)

All 2024 2023 2022

All Journal Article (8 results) (of which Int'l Joint Research: 8 results, Peer Reviewed: 6 results, Open Access: 2 results) Presentation (2 results) (of which Int'l Joint Research: 2 results)

[Journal Article] KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training2024
- Author(s)
  Truong Thao Nguyen, Balazs Gerofi, Edgar Josafat Martinez-Noriega, Francois Trahay, and Mohamed Wahib
- Journal Title
  
  37th Conference on Neural Information Processing Systems (NeurIPS 2023)
  
  Volume: - Pages: 1-23
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] FedDCT: Federated Learning of Large Convolutional Neural Networks on Resource-Constrained Devices Using Divide and Collaborative Training2024
- Author(s)
  Nguyen Quan、Pham Hieu H.、Wong Kok-Seng、Le Nguyen Phi、Nguyen Truong Thao、Do Minh N.
- Journal Title
  
  IEEE Transactions on Network and Service Management
  
  Volume: 21 Issue: 1 Pages: 418-436
- DOI
  10.1109/tnsm.2023.3314066
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] A Bandwidth-Optimal All-to-All Communication in Two-Dimensional Fully Connected Network2024
- Author(s)
  Kien Trung Pham, Thao Nguyen Truong and Michihiro Koibuchi
- Journal Title
  
  24th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing
  
  Volume: - Pages: 1-7
- DOI
  10.1109/ccgrid59990.2024.00010
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] SEM: A Simple Yet Efficient Model-agnostic Local Training Mechanism to Tackle Data Sparsity and Scarcity in Federated Learning2023
- Author(s)
  Pham Quang Ha、Nguyen Nang Hung、Nguyen Thanh Hung、Pham Huy Hieu、Nguyen Phi Le、Nguyen Truong Thao
- Journal Title
  
  Eleventh International Symposium on Computing and Networking (CANDAR)
  
  Volume: - Pages: 120-126
- DOI
  10.1109/candar60563.2023.00023
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Effective Switchless Inter-FPGA Memory Networks2023
- Author(s)
  Truong Thao Nguyen, Kien Trung Pham, Hiroshi Yamaguchi, Yutaka Urino, Michihiro Koibuchi
- Journal Title
  
  Journal of Parallel and Distributed Computing
  
  Volume: -
- Related Report
  2022 Research-status Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with Clustered Aggregation and Knowledge Distilled Regularization2023
- Author(s)
  Nang Hung Nguyen, Duc Long Nguyen, Trong Bang Nguyen, Thanh-Hung Nguyen, Hieu Pham, Truong Thao Nguyen, Phi Le Nguyen
- Journal Title
  
  23rd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing
  
  Volume: - Pages: 249-261
- Related Report
  2022 Research-status Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning2022
- Author(s)
  Truong Thao Nguyen, Francois Trahay, Jens Domke, Aleksandr Drozd, Emil Vatai, Jianwei Liao, Mohamed Wahib, Balazs Gerofi
- Journal Title
  
  36th IEEE International Parallel & Distributed Processing Symposium
  
  Volume: 0 Pages: 1-12
- Related Report
  2021 Research-status Report
- Int'l Joint Research
[Journal Article] Scalable Low-Latency Inter-FPGA Networks2022
- Author(s)
  Kien Trung Pham, Truong Thao Nguyen, Hiroshi Yamaguchi, Yutaka Urino, Michihiro Koibuchi
- Journal Title
  
  36th IEEE International Parallel & Distributed Processing Symposium
  
  Volume: 0 Pages: 1-12
- Related Report
  2021 Research-status Report
- Int'l Joint Research
[Presentation] Efficient Sample Exchanging for Large-Scale Training Distributed Deep Learning with Local Sampling2024
- Author(s)
  Truong Thao Nguyen, Yusuke Tanimura
- Organizer
  International Conference on High Performance Computing in Asia-Pacific Region 2024
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Efficient Allreduce Algorithm for Large-Scale Deep Learning on Distributed Loop Networks2023
- Author(s)
  Truong Thao Nguyen, Peng Chen, Yusuke Tanimura
- Organizer
  International Conference on High Performance Computing in Asia-Pacific Region 2023
- Related Report
  2022 Research-status Report
- Int'l Joint Research

Scalable Hybrid-parallelism Design for Mega-Size Deep Learning Model

Principal Investigator

Nguyen TRUONG 国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 研究員 (60835346)

¥4,680,000 (Direct Cost: ¥3,600,000、Indirect Cost: ¥1,080,000)

Report

Research Products

[Journal Article] KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training2024

Author(s)

Journal Title

Related Report

[Journal Article] FedDCT: Federated Learning of Large Convolutional Neural Networks on Resource-Constrained Devices Using Divide and Collaborative Training2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] A Bandwidth-Optimal All-to-All Communication in Two-Dimensional Fully Connected Network2024

Author(s)

Journal Title

DOI

Related Report

[Journal Article] SEM: A Simple Yet Efficient Model-agnostic Local Training Mechanism to Tackle Data Sparsity and Scarcity in Federated Learning2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Effective Switchless Inter-FPGA Memory Networks2023

Author(s)

Journal Title

Related Report

[Journal Article] CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with Clustered Aggregation and Knowledge Distilled Regularization2023

Author(s)

Journal Title

Related Report

[Journal Article] Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning2022

Author(s)

Journal Title

Related Report

[Journal Article] Scalable Low-Latency Inter-FPGA Networks2022

Author(s)

Journal Title

Related Report

[Presentation] Efficient Sample Exchanging for Large-Scale Training Distributed Deep Learning with Local Sampling2024

Author(s)

Organizer

Related Report

[Presentation] Efficient Allreduce Algorithm for Large-Scale Deep Learning on Distributed Loop Networks2023

Author(s)

Organizer

Related Report