Federated Learning Infrastructure for Collaborative Machine Learning on Heterogeneous Environments
Project/Area Number |
22KJ2289
|
Project/Area Number (Other) |
22J11908 (2022)
|
Research Category |
Grant-in-Aid for JSPS Fellows
|
Allocation Type | Multi-year Fund (2023) Single-year Grants (2022) |
Section | 国内 |
Review Section |
Basic Section 61030:Intelligent informatics-related
|
Research Institution | Nara Institute of Science and Technology |
Principal Investigator |
Thonglek Kundjanasith 奈良先端科学技術大学院大学, 先端科学技術研究科, 特別研究員(DC2)
|
Project Period (FY) |
2023-03-08 – 2024-03-31
|
Project Status |
Granted (Fiscal Year 2023)
|
Budget Amount *help |
¥1,700,000 (Direct Cost: ¥1,700,000)
Fiscal Year 2023: ¥800,000 (Direct Cost: ¥800,000)
Fiscal Year 2022: ¥900,000 (Direct Cost: ¥900,000)
|
Keywords | Collaborative Develop / Distributed Computing / Edge Machine Learning / Federated Learning / Privacy Preservation / Resource Heterogeneity |
Outline of Research at the Start |
I propose LiberatAI, an infrastructure for collaboratively developing machine learning models that allow researchers to work together. LiberatAI applies federated learning to train the models while preserving data privacy. LiberatAI allows individuals to collaboratively train models on their environments, which are usually heterogeneous. Three modules in LiberatAI support training a model on diverse storage, computing, and communication resources. LiberatAI was evaluated using the models to detect COVID-19 which is one of the most popular applications for privacy-sensitive data.
|
Outline of Annual Research Achievements |
I proposed an infrastructure to allow individuals to collaboratively develop machine learning models on their environments, which are usually heterogeneous. The proposed infrastructure allows researchers to work together and potentially build better models than big companies can. The proposed infrastructure applied federated learning to train the models while preserving data privacy. I proposed three components in the proposed infrastructure to support training a model on diverse storage, computing, and network resources efficiently. First, I proposed a component to reduce the model size to fit the storage capacity of the heterogeneous environment. Second, I proposed a component to aggregate the models trained on heterogeneous computing resources. Third, I proposed a component to sparsify the model for exchanging the models between a server and clients. The proposed infrastructure was evaluated using state-of-the-art neural network models to detect COVID-19 cases from chest X-ray images. COVID-19 detection is one of the most popular machine learning applications for privacy-sensitive data. As a result, the ensemble model with heterogeneous structures on six different hardware environments from the proposed infrastructure produces accuracy higher than a trained single COVID-NET by 5.39%.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The proposed project is currently progressing well. I have completed developing the proposed components to address the technical challenges for training the machine learning models on heterogeneous storage, computing, and network resources efficiently. I integrated my three proposed components to build the proposed infrastructure on schedule. Moreover, the proposed infrastructure was evaluated using state-of-the-art neural network models to detect COVID-19 cases from chest X-ray images. COVID-19 detection is one of the most popular machine learning applications for privacy-sensitive data. As a result, the ensemble model with heterogeneous structures on six different hardware environments from the proposed infrastructure produces accuracy higher than a trained single COVID-NET by 5.39%.
|
Strategy for Future Research Activity |
In the future, the generality of the proposed infrastructure will be investigated using a variety of machine learning applications with diverse structures of machine learning models. I plan to evaluate the proposed infrastructure on a large number of edge devices and then improve the resource utilization in the infrastructure. I will work on data security technology to enhance the data protection mechanism in the proposed infrastructure. Additionally, I will publish the proposed infrastructure as open-source software and available for the international or domestic research communities to remove the barrier to the collaborative development of machine learning models from the limitation of data privacy and existing resource constraints.
|
Report
(1 results)
Research Products
(3 results)