• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2023 Fiscal Year Final Research Report

Scalable Hybrid-parallelism Design for Mega-Size Deep Learning Model

Research Project

  • PDF
Project/Area Number 21K17751
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 60090:High performance computing-related
Research InstitutionNational Institute of Advanced Industrial Science and Technology

Principal Investigator

Nguyen TRUONG  国立研究開発法人産業技術総合研究所, 情報・人間工学領域, 研究員 (60835346)

Project Period (FY) 2021-04-01 – 2024-03-31
KeywordsDistributed Training / Large Model / Large dataset / Large scale system
Outline of Final Research Achievements

We deal with memory capacity limitation when training a large model by separating the model into multiple smaller parts (published a Q1 journal-TNSM23). We also found that 3D parallelism (data+pipeline+tensor) becomes standard in training large-scale Deep Learning with large datasets. We proposed the methods to speed up this training process. To reduce the I/O time, we use local shuffling along with a pair-wise data exchanging and a model exchanging to maintain the accuracy of the model. We published 3 papers (IPDPS22a, CCGRID23, CANDAR23), a poster (HPCAsia24), and achieved 2 best paper awards. To reduce the computing time, we eliminate to process the non-important samples during the training (published at a A* conference - Neurips23). We reduce the communication time by co-design network architecture and collective communication. We published 2 rank A paper (IPDPS22b, CCGRID24), a Q1 journal (JPDC23) and a poster (HPCAsia23).

Free Research Field

High performance computing

Academic Significance and Societal Importance of the Research Achievements

Our research helps to support the research and development of big models. It brings a groundbreaking new solution with the requirements of the urgent AI, e.g.,ChatGPT. It can be ultimately contributing to the advancement of AI models, particularly foundational models, in the context of social 5.0.

URL: 

Published: 2025-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi