A study of server management technology for sustaining a large scale distributed neural network

Research Project

Project/Area Number	20K19791
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 60060:Information network-related
Research Institution	Kindai University
Principal Investigator	Mizutani Kimihiro 近畿大学, 情報学部, 准教授 (40845939)
Project Period (FY)	2020-04-01 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥3,120,000 (Direct Cost: ¥2,400,000、Indirect Cost: ¥720,000) Fiscal Year 2022: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000) Fiscal Year 2021: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2020: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
Keywords	広域分散コンピューティング / 分散学習 / 分散ニューラルネットワーク / ネットワーク管理 / 情報ネットワーク / オーバレイネットワーク / 構造化オーバレイネットワーク / P2P / サーバ連携 / 深層学習
Outline of Research at the Start	本研究では，大規模なニューラルネットワークを膨大な数のサーバにて自律的かつ永続的に管理をしつつ，学習の規模拡張性を向上させる分散サーバ連携技術を創出することを目的とする．具体的には，ニューラルネットワークの構成に応じて，自律的にニューラルネットワーク上の計算タスク等をどのサーバに割り当てるかを決定する手法，およびサーバの追加や故障に応じて，サーバ間で計算結果を委譲・復元する手法の確立を目指す．
Outline of Final Research Achievements	In this study, we aim to construct a distributed neural network execution platform by developing core technologies. First, we used structured overlay network technology to quickly restore the distributed platform. This method's strength is in estimating the union of failure nodes and quickly propagating failure information to them. This approach reduces unnecessary failure information propagation and shortens the platform's Mean Time to Repair (MTTR). Secondly, we integrated distributed federated learning techniques into the platform to manage scalable learning nodes. We proposed an efficient scalable node management tree architecture that balances learning efficiency and high fault tolerance. Finally, we developed various schemes for traffic data estimation and control within the platform. By combining these technologies, we expect to maintain a robust and fault-tolerant future distributed neural network management platform.
Academic Significance and Societal Importance of the Research Achievements	本研究では，自律的なニューラルネットワークの分散実行基盤の構築において，学習・推論の永続的な実行をサポートするサーバ連携技術および学習状況の管理手法の提案を行った．サーバ連携技術では，構造化オーバレイ技術を活用し，基盤内で発生するサーバの故障対応を高速化する手法を創出した．学習状況の管理手法については，連合学習フレームワーク上で学習・推論の円滑な同時実行を実現する技術を開発した．さらに，分散実行基盤内で発生するデータの制御・解析に関する技術の創出も行った．これらの技術は，当該研究分野において重要な貢献を果たしており，今後のさらなる研究や実用化の基盤となると考えられる．

Report

(5 results)

2023 Annual Research Report Final Research Report ( PDF )
2022 Research-status Report
2021 Research-status Report
2020 Research-status Report

Research Products
(10 results)

All 2023 2022 2021 2020

All Journal Article (5 results) (of which Peer Reviewed: 5 results, Open Access: 5 results) Presentation (5 results) (of which Int'l Joint Research: 3 results)

[Journal Article] A Comprehensive Evaluation of Generating a Mobile Traffic Data Scheme without a Coarse-Grained Process Using CSR-GAN2022
- Author(s)
  Tokunaga Tomoki、Mizutani Kimihiro
- Journal Title
  
  Sensors
  
  Volume: 22 Issue: 5 Pages: 1930-1930
- DOI
  10.3390/s22051930
- Related Report
  2021 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Effective TCP Flow Management Based on Hierarchical Feedback Learning in Complex Data Center Network2022
- Author(s)
  Mizutani Kimihiro
- Journal Title
  
  Sensors
  
  Volume: 22 Issue: 2 Pages: 611-611
- DOI
  10.3390/s22020611
- Related Report
  2021 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] A novel distributed deep learning training scheme based on distributed skip mesh list2021
- Author(s)
  Suzuki Masaya、Mizutani Kimihiro
- Journal Title
  
  IEICE Communications Express
  
  Volume: 10 Issue: 8 Pages: 463-468
- DOI
  10.1587/comex.2021ETL0023
- NAID
  130008070802
- ISSN
  2187-0136
- Year and Date
  2021-08-01
- Related Report
  2021 Research-status Report 2020 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] A scheme of estimating mobile traffic data without coarse-grained process using conditional SR-GAN2021
- Author(s)
  Tokunaga Tomoki、Mizutani Kimihiro
- Journal Title
  
  IEICE Communications Express
  
  Volume: 10 Issue: 8 Pages: 441-446
- DOI
  10.1587/comex.2021ETL0017
- NAID
  130008070791
- ISSN
  2187-0136
- Year and Date
  2021-08-01
- Related Report
  2021 Research-status Report 2020 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Stateless Node Failure Information Propagation Scheme for Stable Overlay Networks2021
- Author(s)
  Mizutani Kimihiro
- Journal Title
  
  IEEE Access
  
  Volume: 9 Pages: 88737-88745
- DOI
  10.1109/access.2021.3090028
- Related Report
  2021 Research-status Report
- Peer Reviewed / Open Access
[Presentation] An Efficient Approach for Training Time Minimization in Distributed Split Neural Network2023
- Author(s)
  Eigo Yamamoto and Kimihiro Mizutani
- Organizer
  IEEE GCCE
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Accurate Mobile Traffic Generation Scheme without Coarse-grained Data Using Conditional SR-GAN2020
- Author(s)
  Tomoki Tokunaga, Kimihiro Mizutani
- Organizer
  ICETC 2020
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] A Novel Distributed Deep Learning Training Scheme Based on Distributed Skip Mesh List2020
- Author(s)
  Masaya Suzuki, Kimihiro Mizutani
- Organizer
  ICETC 2020
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Conditional SR-GANを用いたモバイルトラフィックデータの圧縮・復元2020
- Author(s)
  徳永智紀, 水谷后宏
- Organizer
  電気関係学会関西連合大会
- Related Report
  2020 Research-status Report
[Presentation] Distributed Skip Mesh Listを用いた大規模ニューラルネットワークの永続的管理手法2020
- Author(s)
  鈴木雅也, 水谷后宏
- Organizer
  電気関係学会関西連合大会
- Related Report
  2020 Research-status Report

A study of server management technology for sustaining a large scale distributed neural network

Principal Investigator

Mizutani Kimihiro 近畿大学, 情報学部, 准教授 (40845939)

¥3,120,000 (Direct Cost: ¥2,400,000、Indirect Cost: ¥720,000)

Report

Research Products

[Journal Article] A Comprehensive Evaluation of Generating a Mobile Traffic Data Scheme without a Coarse-Grained Process Using CSR-GAN2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Effective TCP Flow Management Based on Hierarchical Feedback Learning in Complex Data Center Network2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] A novel distributed deep learning training scheme based on distributed skip mesh list2021

Author(s)

Journal Title

DOI

NAID

ISSN

Year and Date

Related Report

[Journal Article] A scheme of estimating mobile traffic data without coarse-grained process using conditional SR-GAN2021

Author(s)

Journal Title

DOI

NAID

ISSN

Year and Date

Related Report

[Journal Article] Stateless Node Failure Information Propagation Scheme for Stable Overlay Networks2021

Author(s)

Journal Title

DOI

Related Report

[Presentation] An Efficient Approach for Training Time Minimization in Distributed Split Neural Network2023

Author(s)

Organizer

Related Report

[Presentation] Accurate Mobile Traffic Generation Scheme without Coarse-grained Data Using Conditional SR-GAN2020

Author(s)

Organizer

Related Report

[Presentation] A Novel Distributed Deep Learning Training Scheme Based on Distributed Skip Mesh List2020

Author(s)

Organizer

Related Report

[Presentation] Conditional SR-GANを用いたモバイルトラフィックデータの圧縮・復元2020

Author(s)

Organizer

Related Report

[Presentation] Distributed Skip Mesh Listを用いた大規模ニューラルネットワークの永続的管理手法2020

Author(s)

Organizer

Related Report