研究実績の概要 |
We found that 3D parallelism (data+pipeline+model) becomes standard in training large-scale Deep Learning with large datasets. We proposed the methods to speed up this training process: + To reduce the I/O time, we use local shuffling (IPDPS22a paper) along with pair-wise data exchanging (CCGRID23-Best Paper Candidate, HPCAsia24) and model exchanging (CANDAR23-Best Paper ward) to maintain the accuracy of the model. + To reduce the computing time, we eliminate non-important samples during the training (Neurips23). + We reduce the communication time by co-design network architecture and collective communication (IPDPS22b, HPCAsia23, JPDC23, CCGRID24).
We also deal with memory capacity limitation by separating the big model into multiple smaller parts and only assembling it at the end (TNSM23).
|