2023 Fiscal Year Final Research Report
Interconnection Networks accelerating Large-Scale Distributed Deep Learning with In-Network Computing
Project/Area Number |
20K19788
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 60060:Information network-related
|
Research Institution | National Institute of Informatics (2021-2023) Japan Advanced Institute of Science and Technology (2020) |
Principal Investigator |
Kawano Ryuta 国立情報学研究所, アーキテクチャ科学研究系, 特任助教 (90855751)
|
Project Period (FY) |
2020-04-01 – 2024-03-31
|
Keywords | 相互結合網 / 大規模分散深層学習 / ビッグデータ / In-Network Computing / データセンタ |
Outline of Final Research Achievements |
Toward practical application of large-scale AI, applying deep learning to big data is an urgent issue. As a solution, a system in which computing mechanisms called domain-specific architectures are distributed within data centers is considered to be promising. Conventional data center networks make it difficult to speed up large-scale distributed deep learning due to communication performance constraints such as packet latency. In this research, In-Network Computing, which performs intermediate processing of calculations on the network, is focused on. various issues are solved in this research, toward the practical application of an inter-switch network that can achieve both low latency and low frequency communication and the same high bandwidth and scalability as conventional networks.
|
Free Research Field |
計算機アーキテクチャ
|
Academic Significance and Societal Importance of the Research Achievements |
本研究で提案する相互結合システムでは、高性能計算システム向けに培われた超低遅延ネットワーク技術の1つであるRandom Topologyをデータセンタ向けに取り入れることにより、近年普及の進むIn-Network Computing技術に対し最適化した。すなわち、スイッチ間ネットワークとそれに基づく分散深層学習向けマッピングアルゴリズムにTreeベースの物理・論理トポロジを用いるという従来のシステム設計論を覆し、従来システムと同等以上の高帯域・高拡張性を保証し、既存システムとの親和性を図った。 こうした研究成果をOSSとして公開し、提案システムの実用化を推進し、社会需要に応えた。
|