2019 Fiscal Year Research-status Report
Cluster-fault-tolerant routing methods in tori
Project/Area Number |
19K11887
|
Research Institution | Kanagawa University |
Principal Investigator |
ボサール アントワーヌ 神奈川大学, 理学部, 准教授 (20645882)
|
Co-Investigator(Kenkyū-buntansha) |
金子 敬一 東京農工大学, 工学(系)研究科(研究院), 教授 (20194904)
|
Project Period (FY) |
2019-04-01 – 2022-03-31
|
Keywords | routing / fault tolerance / cluster |
Outline of Annual Research Achievements |
The members of this research project have been working on the proposal in a torus network of a point-to-point routing algorithm that is tolerant to faulty clusters. This research has been driven by the adoption by the supercomputing industry, foreign and domestic, of interconnection networks that are based on the torus network topology. More precisely, in an n-dimensional k-ary torus, the algorithm should be capable of selecting a fault-free path when there are at most 2n-1 faulty clusters, each being of diameter at most 1. We have based our research especially on previous works that deal with disjoint-path routing in a torus and other networks, such as the set-to-set disjoint-path and pairwise disjoint-path routing problem. We have summarized our current findings and submitted a corresponding research paper to the 26th International European Conference on Parallel and Distributed Computing (Euro-Par 2020); it is now under review.
|
Current Status of Research Progress |
Current Status of Research Progress
1: Research has progressed more than it was originally planned.
Reason
Given the difficulty of the problem, we had initially planned to spend several years (this is a 3-year research project) in total to completely describe a routing algorithm that is tolerant to faulty clusters inside a torus network. We have made strong progress during the first year and a half of the project: a rather advanced description of an algorithm that solves the considered routing problem in a torus network has been made, and a research paper has even been submitted to a major international conference as previously mentioned. This thus leads us to believe that this research project is "progressing more smoothly than initially planned".
|
Strategy for Future Research Activity |
Now that significant progress has been made with respect to the description of such a routing algorithm, the researchers will work on a computer system implementation in order to 1) eliminate remaining issues, if any, in the described algorithm, and 2) empirically measure the performance, and thus applicability, of the algorithm. Then, once an estimation of the theoretical worst-case performance of the algorithm will have been ready, this average performance information would be used to compare and discuss the possible difference between theoretical and experimental performance estimations. As a result, the researchers are hopeful to establish both the theoretical worst-case time complexity and the average time complexity of the proposed algorithm. Consequently, considering the topological properties of the current supercomputers that are based on a torus network, such as the number of involved computing nodes, the researchers expect to formally prove the applicability of their work.
|
Causes of Carryover |
Because of the new Coronavirus, the travel expenses have been reduced to a minimum. Also, the review process of our paper that was submitted to a conference is still under way. So, the budget of the conference fee is postponed to the next academic year.
|