Machine Learning for Structure-Rich Data-Scarce Domains

Research Project

Project/Area Number	22K12150
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Kyoto University
Principal Investigator	NGUYEN Canh・Hao 京都大学, 化学研究所, 講師 (90626889)
Project Period (FY)	2022-04-01 – 2025-03-31
Project Status	Granted (Fiscal Year 2023)
Budget Amount *help	¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000) Fiscal Year 2024: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2023: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000) Fiscal Year 2022: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Keywords	Graph neural networks / Convex Clustering / machine learning / Machine learning / Structured data / Deep learning / Sparse learning
Outline of Research at the Start	There are three directions of this research project: (1) investigating original machine learning models for complicated structures, (2) designing novel structure discovery tools incorporating domain knowledge, and (3) discovering new biomedical knowledge to be used by domain experts.
Outline of Annual Research Achievements	In this year, we are working on representation of data that are faithful to the original features as well as having cluster structures. We investigated the method of convex clustering to obtain a representation using a convex program, which is efficient and globally optimal. The key idea is to assume that data follows cluster structures. For that, we cluster the data using convex clustering. The advantage of convex clustering is that it is a convex program that guarantees optimality. Another advantage is that it offers a relaxation of k-means and agglomerative clustering algorithms, offering potential advantages of the two algorithms. Our main work here is to analyze analytically what are the clusters that are obtained by convex clustering, pros and cons compared to the other two algorithms. We found that convex cluster only can learn convex clusters. This is similar to k-means and different from agglomerative clustering. We also found that the clusters can be bounded in balls, making them round-shaped. These clusters are found to have gaps between them. These properties show that convex clustering found rather specific types of clusters, rather inflexible compare to the other algorithms.
Current Status of Research Progress	Current Status of Research Progress 3: Progress in research has been slightly delayed. Reason We are working on a particular problem with the difficulty of understanding the formulation of convex clustering, which has not been well studied before.
Strategy for Future Research Activity	We plan to continue working on finding suitable representations of data from original features with additional information such as graphs that are guaranteed to extract more information compared to currently used methods.