2020 Fiscal Year Research-status Report
Cluster-fault-tolerant routing methods in tori
Project/Area Number |
19K11887
|
Research Institution | Kanagawa University |
Principal Investigator |
ボサール アントワーヌ 神奈川大学, 理学部, 准教授 (20645882)
|
Co-Investigator(Kenkyū-buntansha) |
金子 敬一 東京農工大学, 工学(系)研究科(研究院), 教授 (20194904)
|
Project Period (FY) |
2019-04-01 – 2022-03-31
|
Keywords | routing / fault tolerance / cluster / torus |
Outline of Annual Research Achievements |
Aiming at increasing the system dependability of massively parallel systems, we have been conducting research investigations regarding the cluster-fault tolerant routing problem inside the torus topology. The interconnection network of modern supercomputers is often based on this topology, it is for instance the topology of the interconnect of the world number one supercomputer as of November 2020, the supercomputer Fugaku developed by Fujitsu and Riken. One objective of the this research is the proposal of routing algorithm that is able to select a fault-free path inside an n-dimensional k-ary torus which possibly contains faulty clusters of diameter at most one. We have gathered our findings with respect to this research problem and written several research papers. Most notably, one of our articles has been published in the journal Sensors.
|
Current Status of Research Progress |
Current Status of Research Progress
1: Research has progressed more than it was originally planned.
Reason
Solving this routing problem in a deterministic manner was difficult. Hence, we had originally planned to spend several years of this 3-year research project on this problem. We were able to describe an algorithm that solves the cluster-fault tolerant routing problem in an n-dimensional k-ary torus during the second year of the project. Hence, this project is "progressing more smoothly than initially planned". This has given us the opportunity to extend our work on increasing the dependability of massively parallel system by considering the decycling problem in a torus network and efficient programming methods.
|
Strategy for Future Research Activity |
Considering that we succeeded in describing a solution to the cluster-fault tolerant routing problem in a torus, we are going to now focus on further increasing the dependability of systems such as the supercomputer Fugaku that relies on the torus topology. Concretely, we would like to consider methods to reduce the complexities of the already described routing algorithm: time complexity and path lengths. Besides, we are going to continue working on the decycling issue in a torus as it is definitely meaningful for parallel processing.
|
Causes of Carryover |
Just as last year, because of the COVID-19 pandemic, we have not been able to travel at all to attend conferences, even domestic conferences. Hence, the travel expenses are significantly reduced compared to our previsions. We hope to attend conferences during the 2021 fiscal year.
|