Efficient Query Processing for Learning-based Data Management
Project/Area Number |
19K11979
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 60080:Database-related
|
Research Institution | Osaka University |
Principal Investigator |
Xiao Chuan 大阪大学, 情報科学研究科, 准教授 (10643900)
|
Project Period (FY) |
2019-04-01 – 2022-03-31
|
Project Status |
Completed (Fiscal Year 2021)
|
Budget Amount *help |
¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000)
Fiscal Year 2021: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2020: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2019: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
|
Keywords | クエリ処理 / ML for DB / 高次元データ / 類似検索 / query processing / ML + DB / high-dimensional data / similarity search / 問合せ処理 / 機械学習 / データベース / データサイエンス / ML+DB |
Outline of Research at the Start |
With the boom in the machine learning research area, a recent trend in database research is to apply machine learning techniques on challenging database tasks such as entity matching. Existing attempts are confronting the bottleneck of inadequate query processing speed for large-scale datasets and the difficulty in generalization across different applications. This project aims to address the fundamental problems of managing data with machine learning methods. The outcome of the research will have a strong impact by providing practical methods beyond what are currently available.
|
Outline of Final Research Achievements |
We addressed several fundamental problems of query processing for learning-based data management. We developed two solutions to efficient processing of queries on embedding vectors: the first works for binary high-dimensional vectors and efficiently returns answers for similarity search and join queries with Hamming distance constraints; the second handles approximate nearest neighbor search for real-valued high-dimensional vectors by utilizing hierarchical graph structures. We studied the processing of queries with learning-based predicates and developed methods that generate fast query plans through cardinality estimation. We performed system prototyping and evaluation, and released the source codes of our software at GitHub. The outcome of this project provides practical methods for learning-based data management and contributes to the development of next-generation data management systems.
|
Academic Significance and Societal Importance of the Research Achievements |
本研究の成果は、機械学習に基づくデータマネジメントの実践的な手法を提供し、次世代データマネジメントシステムの開発に貢献する。最先端のデータベース技術を進展させ、機械学習、自然言語処理、コンピュータビジョンなどの関連研究分野やマーケティング、医療などの応用での技術開発に強い推進力を与える。また、日本のコンピュータサイエンスにおける威信を高め、海外の研究グループとのコラボレーションを促進することにも貢献する。
|
Report
(4 results)
Research Products
(44 results)