2022 Fiscal Year Annual Research Report
Federated Learning Infrastructure for Collaborative Machine Learning on Heterogeneous Environments
Project/Area Number |
22J11908
|
Allocation Type | Single-year Grants |
Research Institution | Nara Institute of Science and Technology |
Principal Investigator |
Thonglek Kundjanasith 奈良先端科学技術大学院大学, 先端科学技術研究科, 特別研究員(DC2)
|
Project Period (FY) |
2022-04-22 – 2024-03-31
|
Keywords | Collaborative Develop / Distributed Computing / Edge Machine Learning / Federated Learning / Privacy Preservation / Resource Heterogeneity |
Outline of Annual Research Achievements |
I proposed an infrastructure to allow individuals to collaboratively develop machine learning models on their environments, which are usually heterogeneous. The proposed infrastructure allows researchers to work together and potentially build better models than big companies can. The proposed infrastructure applied federated learning to train the models while preserving data privacy. I proposed three components in the proposed infrastructure to support training a model on diverse storage, computing, and network resources efficiently. First, I proposed a component to reduce the model size to fit the storage capacity of the heterogeneous environment. Second, I proposed a component to aggregate the models trained on heterogeneous computing resources. Third, I proposed a component to sparsify the model for exchanging the models between a server and clients. The proposed infrastructure was evaluated using state-of-the-art neural network models to detect COVID-19 cases from chest X-ray images. COVID-19 detection is one of the most popular machine learning applications for privacy-sensitive data. As a result, the ensemble model with heterogeneous structures on six different hardware environments from the proposed infrastructure produces accuracy higher than a trained single COVID-NET by 5.39%.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The proposed project is currently progressing well. I have completed developing the proposed components to address the technical challenges for training the machine learning models on heterogeneous storage, computing, and network resources efficiently. I integrated my three proposed components to build the proposed infrastructure on schedule. Moreover, the proposed infrastructure was evaluated using state-of-the-art neural network models to detect COVID-19 cases from chest X-ray images. COVID-19 detection is one of the most popular machine learning applications for privacy-sensitive data. As a result, the ensemble model with heterogeneous structures on six different hardware environments from the proposed infrastructure produces accuracy higher than a trained single COVID-NET by 5.39%.
|
Strategy for Future Research Activity |
In the future, the generality of the proposed infrastructure will be investigated using a variety of machine learning applications with diverse structures of machine learning models. I plan to evaluate the proposed infrastructure on a large number of edge devices and then improve the resource utilization in the infrastructure. I will work on data security technology to enhance the data protection mechanism in the proposed infrastructure. Additionally, I will publish the proposed infrastructure as open-source software and available for the international or domestic research communities to remove the barrier to the collaborative development of machine learning models from the limitation of data privacy and existing resource constraints.
|
Research Products
(3 results)