2023 Fiscal Year Final Research Report

Interconnection Networks accelerating Large-Scale Distributed Deep Learning with In-Network Computing

Research Project

PDF

Project/Area Number	20K19788
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 60060:Information network-related
Research Institution	National Institute of Informatics (2021-2023) Japan Advanced Institute of Science and Technology (2020)
Principal Investigator	Kawano Ryuta 国立情報学研究所, アーキテクチャ科学研究系, 特任助教 (90855751)
Project Period (FY)	2020-04-01 – 2024-03-31
Keywords	相互結合網 / 大規模分散深層学習 / ビッグデータ / In-Network Computing / データセンタ
Outline of Final Research Achievements	Toward practical application of large-scale AI, applying deep learning to big data is an urgent issue. As a solution, a system in which computing mechanisms called domain-specific architectures are distributed within data centers is considered to be promising. Conventional data center networks make it difficult to speed up large-scale distributed deep learning due to communication performance constraints such as packet latency. In this research, In-Network Computing, which performs intermediate processing of calculations on the network, is focused on. various issues are solved in this research, toward the practical application of an inter-switch network that can achieve both low latency and low frequency communication and the same high bandwidth and scalability as conventional networks.
Free Research Field	計算機アーキテクチャ
Academic Significance and Societal Importance of the Research Achievements	本研究で提案する相互結合システムでは、高性能計算システム向けに培われた超低遅延ネットワーク技術の1つであるRandom Topologyをデータセンタ向けに取り入れることにより、近年普及の進むIn-Network Computing技術に対し最適化した。すなわち、スイッチ間ネットワークとそれに基づく分散深層学習向けマッピングアルゴリズムにTreeベースの物理・論理トポロジを用いるという従来のシステム設計論を覆し、従来システムと同等以上の高帯域・高拡張性を保証し、既存システムとの親和性を図った。こうした研究成果をOSSとして公開し、提案システムの実用化を推進し、社会需要に応えた。