Efficient Query Processing for Learning-based Data Management

Research Project

Project/Area Number	19K11979
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 60080:Database-related
Research Institution	Osaka University
Principal Investigator	Xiao Chuan 大阪大学, 情報科学研究科, 准教授 (10643900)
Project Period (FY)	2019-04-01 – 2022-03-31
Project Status	Completed (Fiscal Year 2021)
Budget Amount *help	¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000) Fiscal Year 2021: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2020: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2019: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Keywords	クエリ処理 / ML for DB / 高次元データ / 類似検索 / query processing / ML + DB / high-dimensional data / similarity search / 問合せ処理 / 機械学習 / データベース / データサイエンス / ML+DB
Outline of Research at the Start	With the boom in the machine learning research area, a recent trend in database research is to apply machine learning techniques on challenging database tasks such as entity matching. Existing attempts are confronting the bottleneck of inadequate query processing speed for large-scale datasets and the difficulty in generalization across different applications. This project aims to address the fundamental problems of managing data with machine learning methods. The outcome of the research will have a strong impact by providing practical methods beyond what are currently available.
Outline of Final Research Achievements	We addressed several fundamental problems of query processing for learning-based data management. We developed two solutions to efficient processing of queries on embedding vectors: the first works for binary high-dimensional vectors and efficiently returns answers for similarity search and join queries with Hamming distance constraints; the second handles approximate nearest neighbor search for real-valued high-dimensional vectors by utilizing hierarchical graph structures. We studied the processing of queries with learning-based predicates and developed methods that generate fast query plans through cardinality estimation. We performed system prototyping and evaluation, and released the source codes of our software at GitHub. The outcome of this project provides practical methods for learning-based data management and contributes to the development of next-generation data management systems.
Academic Significance and Societal Importance of the Research Achievements	本研究の成果は、機械学習に基づくデータマネジメントの実践的な手法を提供し、次世代データマネジメントシステムの開発に貢献する。最先端のデータベース技術を進展させ、機械学習、自然言語処理、コンピュータビジョンなどの関連研究分野やマーケティング、医療などの応用での技術開発に強い推進力を与える。また、日本のコンピュータサイエンスにおける威信を高め、海外の研究グループとのコラボレーションを促進することにも貢献する。

Report

(4 results)

2021 Annual Research Report Final Research Report ( PDF )
2020 Research-status Report
2019 Research-status Report

Research Products
(44 results)

All 2022 2021 2020 2019 Other

All Int'l Joint Research (6 results) Journal Article (10 results) (of which Int'l Joint Research: 5 results, Peer Reviewed: 10 results, Open Access: 10 results) Presentation (23 results) (of which Int'l Joint Research: 9 results) Remarks (5 results)

[Int'l Joint Research] シドニー工科大学(オーストラリア)
- Related Report
  2021 Annual Research Report
[Int'l Joint Research] 香港科技大学/深セン大学/深セン計算科学研究院(中国)
- Related Report
  2021 Annual Research Report
[Int'l Joint Research] ニューサウスウェールズ大学/メルボルン大学/シドニー工科大学(オーストラリア)
- Related Report
  2020 Research-status Report
[Int'l Joint Research] 深セン大学(中国)
- Related Report
  2020 Research-status Report
[Int'l Joint Research] ニューサウスウェールズ大学/メルボルン大学(オーストラリア)
- Related Report
  2019 Research-status Report
[Int'l Joint Research] 香港科技大学/北京理工大学/深セン計算科学研究院(中国)
- Related Report
  2019 Research-status Report
[Journal Article] HSGAN: Reducing mode collapse in GANs by the latent code distance of homogeneous samples2022
- Author(s)
  Simin Yu, Kuntian Zhang, Chuan Xiao, Joshua Zhexue Huang, Mark Junjie Li, Makoto Onizuka
- Journal Title
  
  Computer Vision and Image Understanding
  
  Volume: 214 Pages: 103314-103314
- DOI
  10.1016/j.cviu.2021.103314
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] HVS: Hierarchical Graph Structure Based on Voronoi Diagrams for Solving Approximate Nearest Neighbor Search2021
- Author(s)
  Kejing Lu, Mineichi Kudo, Chuan Xiao, Yoshiharu Ishikawa
- Journal Title
  
  Proceedings of the VLDB Endowment
  
  Volume: 15 Issue: 2 Pages: 246-258
- DOI
  10.14778/3489496.3489506
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Continuous Top-k Spatial-Keyword Search on Dynamic Objects2021
- Author(s)
  Yuyang Dong, Chuan Xiao, Hanxiong Chen, Jeffrey Xu Yu, Kunihiro Takeoka, Masafumi Oyamada, and Hiroyuki Kitagawa
- Journal Title
  
  The VLDB Journal
  
  Volume: 30 Issue: 2 Pages: 141-161
- DOI
  10.1007/s00778-020-00627-4
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Compressed Indexing for Trajectories Constrained in Road Networks2020
- Author(s)
  小出智士, 肖川, 石川佳治
- Journal Title
  
  電子情報通信学会論文誌D 情報・システム
  
  Volume: J103-D Issue: 5 Pages: 393-402
- DOI
  10.14923/transinfj.2019DET0001
- NAID
  130008110439
- ISSN
  1880-4535, 1881-0225
- Year and Date
  2020-05-01
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Similarity Query Processing for High-Dimensional Data2020
- Author(s)
  Jianbin Qin, Wei Wang, Chuan Xiao, and Ying Zhang
- Journal Title
  
  Proceedings of the VLDB Endowment
  
  Volume: 13 Issue: 12 Pages: 3437-3440
- DOI
  10.14778/3415478.3415564
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Fast Subtrajectory Similarity Search in Road Networks under Weighted Edit Distance Constraints2020
- Author(s)
  Satoshi Koide, Chuan Xiao, and Yoshiharu Ishikawa
- Journal Title
  
  Proceedings of the VLDB Endowment
  
  Volume: 13 Issue: 12 Pages: 2188-2201
- DOI
  10.14778/3407790.3407818
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Efficient Query Autocompletion with Edit Distance-based Error Tolerance2020
- Author(s)
  Jianbin Qin, Chuan Xiao, Sheng Hu, Jie Zhang, Wei Wang, Yoshiharu Ishikawa, Koji Tsuda, Kunihiko Sadakane
- Journal Title
  
  The VLDB Journal
  
  Volume: - Issue: 4 Pages: 919-943
- DOI
  10.1007/s00778-019-00595-4
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Generalizing the Pigeonhole Principle for Similarity Search in Hamming Space2020
- Author(s)
  Jianbin Qin, Chuan Xiao, Yaoshu Wang, Wei Wang, Xuemin Lin, Yoshiharu Ishikawa, Guoren Wang
- Journal Title
  
  IEEE Transactions on Knowledge and Data Engineering
  
  Volume: - Pages: 489-505
- DOI
  10.1109/tkde.2019.2899597
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Building Hierarchical Spatial Histograms for Exploratory Analysis in Array DBMS2019
- Author(s)
  Jing Zhao, Yoshiharu Ishikawa, Lei Chen, Chuan Xiao, Kento Sugiura
- Journal Title
  
  IEICE Transactions on Information and Systems
  
  Volume: E102.D Issue: 4 Pages: 788-799
- DOI
  10.1587/transinf.2018DAP0020
- NAID
  130007621888
- ISSN
  0916-8532, 1745-1361
- Year and Date
  2019-04-01
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Scope-aware Code Completion with Discriminative Modeling2019
- Author(s)
  Sheng Hu, Chuan Xiao, Yoshiharu Ishikawa
- Journal Title
  
  Journal of Information Processing
  
  Volume: 27 Issue: 0 Pages: 469-478
- DOI
  10.2197/ipsjjip.27.469
- NAID
  130007690191
- ISSN
  1882-6652
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access
[Presentation] JupySim: Jupyter Notebook Similarity Search System2022
- Author(s)
  Misato Horiuchi, Yuya Sasaki, Chuan Xiao, Makoto Onizuka
- Organizer
  International Conference on Extending Database Technology (EDBT)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] 深層生成モデルを用いた編集を意識した分子グラフ補完2022
- Author(s)
  胡晟, 瀧川一学, 肖川
- Organizer
  第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
- Related Report
  2021 Annual Research Report
[Presentation] 学習型索引を用いた時系列データ検索の高速化2022
- Author(s)
  松本和人, 肖川, 鬼塚真
- Organizer
  第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
- Related Report
  2021 Annual Research Report
[Presentation] Attention GANを用いたテーブルデータの欠測値補完2022
- Author(s)
  河越淳, 董于洋, 野澤拓磨, 肖川
- Organizer
  第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
- Related Report
  2021 Annual Research Report
[Presentation] 結合カーディナリティ推定の中間結果を利用した結合順最適化2022
- Author(s)
  川本孝太朗, 伊藤竜一, 佐々木勇和, 肖川, 鬼塚真
- Organizer
  第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
- Related Report
  2021 Annual Research Report
[Presentation] 統合型データベースにおける適応的2相ロックに基づく分散トランザクション制御2022
- Author(s)
  三宅康太, 佐々木勇和, 肖川, 鬼塚真
- Organizer
  第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
- Related Report
  2021 Annual Research Report
[Presentation] モデル構造の自動チューニングを用いたパーソナライズド連合学習手法2022
- Author(s)
  松田光司, 佐々木勇和, 肖川, 鬼塚真
- Organizer
  第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
- Related Report
  2021 Annual Research Report
[Presentation] 機械学習によるトランザクション処理性能の網羅的な評価2022
- Author(s)
  池田悠人, 三宅康太, 肖川, 鬼塚真
- Organizer
  第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
- Related Report
  2021 Annual Research Report
[Presentation] High-Dimensional Similarity Query Processing for Data Science2021
- Author(s)
  Jianbin Qin, Wei Wang, Chuan Xiao, Ying Zhang, Yaoshu Wang
- Organizer
  ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] BTGAN: Training GAN with Balanced Triplet Loss and Two-Branch Architecture2021
- Author(s)
  Simin Yu, Kuntian Zhang, Chuan Xiao, Xianyu Bao, Joshua Zhexue Huang, Mark Junjie Li
- Organizer
  International Joint Conference on Neural Networks (IJCNN)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Consistent and Flexible Selectivity Estimation for High-Dimensional Data2021
- Author(s)
  Yaoshu Wang, Chuan Xiao, Jianbin Qin, Rui Mao, Makoto Onizuka, Wei Wang, Rui Zhang, and Yoshiharu Ishikawa
- Organizer
  ACM SIGMOD International Conference on Management of Data (SIGMOD 2021)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach2021
- Author(s)
  Yuyang Dong, Kunihiro Takeoka, Chuan Xiao, and Masafumi Oyamada
- Organizer
  IEEE International Conference on Data Engineering (ICDE 2021)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Non-Autoregressiveモデルによる高速で安定したカーディナリティ推定2021
- Author(s)
  伊藤竜一, 佐々木勇和, 肖川, 鬼塚真
- Organizer
  第13回データ工学と情報マネジメントに関するフォーラム (DEIM 2021)
- Related Report
  2020 Research-status Report
[Presentation] FedMe: モデル交換に基づく連合学習手法2021
- Author(s)
  松田光司, 堀敬三, 佐々木勇和, 肖川, 鬼塚真
- Organizer
  第13回データ工学と情報マネジメントに関するフォーラム (DEIM 2021)
- Related Report
  2020 Research-status Report
[Presentation] 計算ノートブック類似検索のための高速な検索アルゴリズム2021
- Author(s)
  堀内美聡, 山崎翔平, 佐々木勇和, 肖川, 鬼塚真
- Organizer
  第13回データ工学と情報マネジメントに関するフォーラム (DEIM 2021)
- Related Report
  2020 Research-status Report
[Presentation] 深層生成モデルを用いた分子グラフ自動補完2021
- Author(s)
  胡晟, 瀧川一学, 肖川
- Organizer
  第13回データ工学と情報マネジメントに関するフォーラム (DEIM 2021)
- Related Report
  2020 Research-status Report
[Presentation] Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach2020
- Author(s)
  Yaoshu Wang, Chuan Xiao, Jianbin Qin, Xin Cao, Yifang Sun, Wei Wang, and Makoto Onizuka
- Organizer
  ACM SIGMOD International Conference on Management of Data (SIGMOD 2020)
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] P2P型データ統合アーキテクチャにおけるチケットベース手法を用いた分散トランザクション制御2020
- Author(s)
  三宅康太, 涌田悠佑, 佐々木勇和, 肖川, 鬼塚真
- Organizer
  第12回データ工学と情報マネジメントに関するフォーラム (DEIM 2020)
- Related Report
  2019 Research-status Report
[Presentation] トライ木及びGMMに基づく略語のフルネームのスケーラブルな推測手法2020
- Author(s)
  高明敏, 肖川, 石川佳治
- Organizer
  第12回データ工学と情報マネジメントに関するフォーラム (DEIM 2020)
- Related Report
  2019 Research-status Report
[Presentation] 多様化軌跡を効率検索するための統合クエリパラダイム2020
- Author(s)
  胡晟, 馬強, 肖川
- Organizer
  第12回データ工学と情報マネジメントに関するフォーラム (DEIM 2020)
- Related Report
  2019 Research-status Report
[Presentation] Distributed Transaction Management for P2P-based Update Propagation2019
- Author(s)
  Makoto Onizuka, Yusuke Wakuta, Yuya Sasaki, Chuan Xiao
- Organizer
  The 3rd Workshop on Software Foundations for Data Interoperability (SFDI 2019)
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Autocompletion for Prefix-Abbreviated Input2019
- Author(s)
  Sheng Hu, Chuan Xiao, Jianbin Qin, Yoshiharu Ishikawa, Qiang Ma
- Organizer
  ACM SIGMOD International Conference on Management of Data (SIGMOD 2019)
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Dynamic Set kNN Self-Join2019
- Author(s)
  Daichi Amagata, Takahiro Hara, Chuan Xiao
- Organizer
  The 35th IEEE International Conference on Data Engineering (ICDE 2019)
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Remarks] 大阪大学ビッグデータ工学講座鬼塚研究室
- URL
  http://www-bigdata.ist.osaka-u.ac.jp/ja/paper/
- Related Report
  2021 Annual Research Report 2020 Research-status Report 2019 Research-status Report
[Remarks] 名古屋大学情報学研究科データベース研究室（石川研究室）
- URL
  https://www.db.is.i.nagoya-u.ac.jp/ja/research/publications/
- Related Report
  2021 Annual Research Report 2020 Research-status Report 2019 Research-status Report
[Remarks] Chuan Xiaoのホームページ
- URL
  https://sites.google.com/site/chuanxiao1983/publication
- Related Report
  2021 Annual Research Report
[Remarks] Chuan XiaoのDBLPページ
- URL
  https://dblp.org/pid/57/4384-1.html
- Related Report
  2021 Annual Research Report
[Remarks] Chuan Xiao's homepage
- URL
  https://sites.google.com/site/chuanxiao1983/publication
- Related Report
  2020 Research-status Report 2019 Research-status Report

Efficient Query Processing for Learning-based Data Management

Principal Investigator

Xiao Chuan 大阪大学, 情報科学研究科, 准教授 (10643900)

¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000)

Report

Research Products

[Int'l Joint Research] シドニー工科大学(オーストラリア)

Related Report

[Int'l Joint Research] 香港科技大学/深セン大学/深セン計算科学研究院(中国)

Related Report

[Int'l Joint Research] ニューサウスウェールズ大学/メルボルン大学/シドニー工科大学(オーストラリア)

Related Report

[Int'l Joint Research] 深セン大学(中国)

Related Report

[Int'l Joint Research] ニューサウスウェールズ大学/メルボルン大学(オーストラリア)

Related Report

[Int'l Joint Research] 香港科技大学/北京理工大学/深セン計算科学研究院(中国)

Related Report

[Journal Article] HSGAN: Reducing mode collapse in GANs by the latent code distance of homogeneous samples2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] HVS: Hierarchical Graph Structure Based on Voronoi Diagrams for Solving Approximate Nearest Neighbor Search2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Continuous Top-k Spatial-Keyword Search on Dynamic Objects2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Compressed Indexing for Trajectories Constrained in Road Networks2020

Author(s)

Journal Title

DOI

NAID

ISSN

Year and Date

Related Report

[Journal Article] Similarity Query Processing for High-Dimensional Data2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Fast Subtrajectory Similarity Search in Road Networks under Weighted Edit Distance Constraints2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Efficient Query Autocompletion with Edit Distance-based Error Tolerance2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Generalizing the Pigeonhole Principle for Similarity Search in Hamming Space2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Building Hierarchical Spatial Histograms for Exploratory Analysis in Array DBMS2019

Author(s)

Journal Title

DOI

NAID

ISSN

Year and Date

Related Report

[Journal Article] Scope-aware Code Completion with Discriminative Modeling2019

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Presentation] JupySim: Jupyter Notebook Similarity Search System2022

Author(s)

Organizer

Related Report