• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Efficient Query Processing for Learning-based Data Management

Research Project

Project/Area Number 19K11979
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeMulti-year Fund
Section一般
Review Section Basic Section 60080:Database-related
Research InstitutionOsaka University

Principal Investigator

Xiao Chuan  大阪大学, 情報科学研究科, 准教授 (10643900)

Project Period (FY) 2019-04-01 – 2022-03-31
Project Status Completed (Fiscal Year 2021)
Budget Amount *help
¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000)
Fiscal Year 2021: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2020: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2019: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Keywordsクエリ処理 / ML for DB / 高次元データ / 類似検索 / query processing / ML + DB / high-dimensional data / similarity search / 問合せ処理 / 機械学習 / データベース / データサイエンス / ML+DB
Outline of Research at the Start

With the boom in the machine learning research area, a recent trend in database research is to apply machine learning techniques on challenging database tasks such as entity matching. Existing attempts are confronting the bottleneck of inadequate query processing speed for large-scale datasets and the difficulty in generalization across different applications. This project aims to address the fundamental problems of managing data with machine learning methods. The outcome of the research will have a strong impact by providing practical methods beyond what are currently available.

Outline of Final Research Achievements

We addressed several fundamental problems of query processing for learning-based data management. We developed two solutions to efficient processing of queries on embedding vectors: the first works for binary high-dimensional vectors and efficiently returns answers for similarity search and join queries with Hamming distance constraints; the second handles approximate nearest neighbor search for real-valued high-dimensional vectors by utilizing hierarchical graph structures. We studied the processing of queries with learning-based predicates and developed methods that generate fast query plans through cardinality estimation. We performed system prototyping and evaluation, and released the source codes of our software at GitHub. The outcome of this project provides practical methods for learning-based data management and contributes to the development of next-generation data management systems.

Academic Significance and Societal Importance of the Research Achievements

本研究の成果は、機械学習に基づくデータマネジメントの実践的な手法を提供し、次世代データマネジメントシステムの開発に貢献する。最先端のデータベース技術を進展させ、機械学習、自然言語処理、コンピュータビジョンなどの関連研究分野やマーケティング、医療などの応用での技術開発に強い推進力を与える。また、日本のコンピュータサイエンスにおける威信を高め、海外の研究グループとのコラボレーションを促進することにも貢献する。

Report

(4 results)
  • 2021 Annual Research Report   Final Research Report ( PDF )
  • 2020 Research-status Report
  • 2019 Research-status Report
  • Research Products

    (44 results)

All 2022 2021 2020 2019 Other

All Int'l Joint Research (6 results) Journal Article (10 results) (of which Int'l Joint Research: 5 results,  Peer Reviewed: 10 results,  Open Access: 10 results) Presentation (23 results) (of which Int'l Joint Research: 9 results) Remarks (5 results)

  • [Int'l Joint Research] シドニー工科大学(オーストラリア)

    • Related Report
      2021 Annual Research Report
  • [Int'l Joint Research] 香港科技大学/深セン大学/深セン計算科学研究院(中国)

    • Related Report
      2021 Annual Research Report
  • [Int'l Joint Research] ニューサウスウェールズ大学/メルボルン大学/シドニー工科大学(オーストラリア)

    • Related Report
      2020 Research-status Report
  • [Int'l Joint Research] 深セン大学(中国)

    • Related Report
      2020 Research-status Report
  • [Int'l Joint Research] ニューサウスウェールズ大学/メルボルン大学(オーストラリア)

    • Related Report
      2019 Research-status Report
  • [Int'l Joint Research] 香港科技大学/北京理工大学/深セン計算科学研究院(中国)

    • Related Report
      2019 Research-status Report
  • [Journal Article] HSGAN: Reducing mode collapse in GANs by the latent code distance of homogeneous samples2022

    • Author(s)
      Simin Yu, Kuntian Zhang, Chuan Xiao, Joshua Zhexue Huang, Mark Junjie Li, Makoto Onizuka
    • Journal Title

      Computer Vision and Image Understanding

      Volume: 214 Pages: 103314-103314

    • DOI

      10.1016/j.cviu.2021.103314

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] HVS: Hierarchical Graph Structure Based on Voronoi Diagrams for Solving Approximate Nearest Neighbor Search2021

    • Author(s)
      Kejing Lu, Mineichi Kudo, Chuan Xiao, Yoshiharu Ishikawa
    • Journal Title

      Proceedings of the VLDB Endowment

      Volume: 15 Issue: 2 Pages: 246-258

    • DOI

      10.14778/3489496.3489506

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Continuous Top-k Spatial-Keyword Search on Dynamic Objects2021

    • Author(s)
      Yuyang Dong, Chuan Xiao, Hanxiong Chen, Jeffrey Xu Yu, Kunihiro Takeoka, Masafumi Oyamada, and Hiroyuki Kitagawa
    • Journal Title

      The VLDB Journal

      Volume: 30 Issue: 2 Pages: 141-161

    • DOI

      10.1007/s00778-020-00627-4

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Compressed Indexing for Trajectories Constrained in Road Networks2020

    • Author(s)
      小出 智士, 肖 川, 石川 佳治
    • Journal Title

      電子情報通信学会論文誌D 情報・システム

      Volume: J103-D Issue: 5 Pages: 393-402

    • DOI

      10.14923/transinfj.2019DET0001

    • NAID

      130008110439

    • ISSN
      1880-4535, 1881-0225
    • Year and Date
      2020-05-01
    • Related Report
      2019 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Similarity Query Processing for High-Dimensional Data2020

    • Author(s)
      Jianbin Qin, Wei Wang, Chuan Xiao, and Ying Zhang
    • Journal Title

      Proceedings of the VLDB Endowment

      Volume: 13 Issue: 12 Pages: 3437-3440

    • DOI

      10.14778/3415478.3415564

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Fast Subtrajectory Similarity Search in Road Networks under Weighted Edit Distance Constraints2020

    • Author(s)
      Satoshi Koide, Chuan Xiao, and Yoshiharu Ishikawa
    • Journal Title

      Proceedings of the VLDB Endowment

      Volume: 13 Issue: 12 Pages: 2188-2201

    • DOI

      10.14778/3407790.3407818

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Efficient Query Autocompletion with Edit Distance-based Error Tolerance2020

    • Author(s)
      Jianbin Qin, Chuan Xiao, Sheng Hu, Jie Zhang, Wei Wang, Yoshiharu Ishikawa, Koji Tsuda, Kunihiko Sadakane
    • Journal Title

      The VLDB Journal

      Volume: - Issue: 4 Pages: 919-943

    • DOI

      10.1007/s00778-019-00595-4

    • Related Report
      2019 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Generalizing the Pigeonhole Principle for Similarity Search in Hamming Space2020

    • Author(s)
      Jianbin Qin, Chuan Xiao, Yaoshu Wang, Wei Wang, Xuemin Lin, Yoshiharu Ishikawa, Guoren Wang
    • Journal Title

      IEEE Transactions on Knowledge and Data Engineering

      Volume: - Pages: 489-505

    • DOI

      10.1109/tkde.2019.2899597

    • Related Report
      2019 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Building Hierarchical Spatial Histograms for Exploratory Analysis in Array DBMS2019

    • Author(s)
      Jing Zhao, Yoshiharu Ishikawa, Lei Chen, Chuan Xiao, Kento Sugiura
    • Journal Title

      IEICE Transactions on Information and Systems

      Volume: E102.D Issue: 4 Pages: 788-799

    • DOI

      10.1587/transinf.2018DAP0020

    • NAID

      130007621888

    • ISSN
      0916-8532, 1745-1361
    • Year and Date
      2019-04-01
    • Related Report
      2019 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Scope-aware Code Completion with Discriminative Modeling2019

    • Author(s)
      Sheng Hu, Chuan Xiao, Yoshiharu Ishikawa
    • Journal Title

      Journal of Information Processing

      Volume: 27 Issue: 0 Pages: 469-478

    • DOI

      10.2197/ipsjjip.27.469

    • NAID

      130007690191

    • ISSN
      1882-6652
    • Related Report
      2019 Research-status Report
    • Peer Reviewed / Open Access
  • [Presentation] JupySim: Jupyter Notebook Similarity Search System2022

    • Author(s)
      Misato Horiuchi, Yuya Sasaki, Chuan Xiao, Makoto Onizuka
    • Organizer
      International Conference on Extending Database Technology (EDBT)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 深層生成モデルを用いた編集を意識した分子グラフ補完2022

    • Author(s)
      胡晟, 瀧川一学, 肖川
    • Organizer
      第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
    • Related Report
      2021 Annual Research Report
  • [Presentation] 学習型索引を用いた時系列データ検索の高速化2022

    • Author(s)
      松本和人, 肖川, 鬼塚真
    • Organizer
      第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
    • Related Report
      2021 Annual Research Report
  • [Presentation] Attention GANを用いたテーブルデータの欠測値補完2022

    • Author(s)
      河越淳, 董于洋, 野澤拓磨, 肖川
    • Organizer
      第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
    • Related Report
      2021 Annual Research Report
  • [Presentation] 結合カーディナリティ推定の中間結果を利用した結合順最適化2022

    • Author(s)
      川本孝太朗, 伊藤竜一, 佐々木勇和, 肖川, 鬼塚真
    • Organizer
      第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
    • Related Report
      2021 Annual Research Report
  • [Presentation] 統合型データベースにおける適応的2相ロックに基づく分散トランザクション制御2022

    • Author(s)
      三宅康太, 佐々木勇和, 肖川, 鬼塚真
    • Organizer
      第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
    • Related Report
      2021 Annual Research Report
  • [Presentation] モデル構造の自動チューニングを用いたパーソナライズド連合学習手法2022

    • Author(s)
      松田光司, 佐々木勇和, 肖川, 鬼塚真
    • Organizer
      第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
    • Related Report
      2021 Annual Research Report
  • [Presentation] 機械学習によるトランザクション処理性能の網羅的な評価2022

    • Author(s)
      池田悠人, 三宅康太, 肖川, 鬼塚真
    • Organizer
      第14回データ工学と情報マネジメントに関するフォーラム (DEIM)
    • Related Report
      2021 Annual Research Report
  • [Presentation] High-Dimensional Similarity Query Processing for Data Science2021

    • Author(s)
      Jianbin Qin, Wei Wang, Chuan Xiao, Ying Zhang, Yaoshu Wang
    • Organizer
      ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] BTGAN: Training GAN with Balanced Triplet Loss and Two-Branch Architecture2021

    • Author(s)
      Simin Yu, Kuntian Zhang, Chuan Xiao, Xianyu Bao, Joshua Zhexue Huang, Mark Junjie Li
    • Organizer
      International Joint Conference on Neural Networks (IJCNN)
    • Related Report
      2021 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Consistent and Flexible Selectivity Estimation for High-Dimensional Data2021

    • Author(s)
      Yaoshu Wang, Chuan Xiao, Jianbin Qin, Rui Mao, Makoto Onizuka, Wei Wang, Rui Zhang, and Yoshiharu Ishikawa
    • Organizer
      ACM SIGMOD International Conference on Management of Data (SIGMOD 2021)
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach2021

    • Author(s)
      Yuyang Dong, Kunihiro Takeoka, Chuan Xiao, and Masafumi Oyamada
    • Organizer
      IEEE International Conference on Data Engineering (ICDE 2021)
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] Non-Autoregressiveモデルによる高速で安定したカーディナリティ推定2021

    • Author(s)
      伊藤竜一, 佐々木勇和, 肖川, 鬼塚真
    • Organizer
      第13回データ工学と情報マネジメントに関するフォーラム (DEIM 2021)
    • Related Report
      2020 Research-status Report
  • [Presentation] FedMe: モデル交換に基づく連合学習手法2021

    • Author(s)
      松田光司, 堀敬三, 佐々木勇和, 肖川, 鬼塚真
    • Organizer
      第13回データ工学と情報マネジメントに関するフォーラム (DEIM 2021)
    • Related Report
      2020 Research-status Report
  • [Presentation] 計算ノートブック類似検索のための高速な検索アルゴリズム2021

    • Author(s)
      堀内美聡, 山崎翔平, 佐々木勇和, 肖川, 鬼塚真
    • Organizer
      第13回データ工学と情報マネジメントに関するフォーラム (DEIM 2021)
    • Related Report
      2020 Research-status Report
  • [Presentation] 深層生成モデルを用いた分子グラフ自動補完2021

    • Author(s)
      胡晟, 瀧川一学, 肖川
    • Organizer
      第13回データ工学と情報マネジメントに関するフォーラム (DEIM 2021)
    • Related Report
      2020 Research-status Report
  • [Presentation] Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach2020

    • Author(s)
      Yaoshu Wang, Chuan Xiao, Jianbin Qin, Xin Cao, Yifang Sun, Wei Wang, and Makoto Onizuka
    • Organizer
      ACM SIGMOD International Conference on Management of Data (SIGMOD 2020)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] P2P型データ統合アーキテクチャにおけるチケットベース手法を用いた分散トランザクション制御2020

    • Author(s)
      三宅 康太, 涌田 悠佑, 佐々木 勇和, 肖 川, 鬼塚 真
    • Organizer
      第12回データ工学と情報マネジメントに関するフォーラム (DEIM 2020)
    • Related Report
      2019 Research-status Report
  • [Presentation] トライ木及びGMMに基づく略語のフルネームのスケーラブルな推測手法2020

    • Author(s)
      高 明敏, 肖 川, 石川 佳治
    • Organizer
      第12回データ工学と情報マネジメントに関するフォーラム (DEIM 2020)
    • Related Report
      2019 Research-status Report
  • [Presentation] 多様化軌跡を効率検索するための統合クエリパラダイム2020

    • Author(s)
      胡 晟, 馬 強, 肖 川
    • Organizer
      第12回データ工学と情報マネジメントに関するフォーラム (DEIM 2020)
    • Related Report
      2019 Research-status Report
  • [Presentation] Distributed Transaction Management for P2P-based Update Propagation2019

    • Author(s)
      Makoto Onizuka, Yusuke Wakuta, Yuya Sasaki, Chuan Xiao
    • Organizer
      The 3rd Workshop on Software Foundations for Data Interoperability (SFDI 2019)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Autocompletion for Prefix-Abbreviated Input2019

    • Author(s)
      Sheng Hu, Chuan Xiao, Jianbin Qin, Yoshiharu Ishikawa, Qiang Ma
    • Organizer
      ACM SIGMOD International Conference on Management of Data (SIGMOD 2019)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Dynamic Set kNN Self-Join2019

    • Author(s)
      Daichi Amagata, Takahiro Hara, Chuan Xiao
    • Organizer
      The 35th IEEE International Conference on Data Engineering (ICDE 2019)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Remarks] 大阪大学 ビッグデータ工学講座 鬼塚研究室

    • URL

      http://www-bigdata.ist.osaka-u.ac.jp/ja/paper/

    • Related Report
      2021 Annual Research Report 2020 Research-status Report 2019 Research-status Report
  • [Remarks] 名古屋大学 情報学研究科 データベース研究室(石川研究室)

    • URL

      https://www.db.is.i.nagoya-u.ac.jp/ja/research/publications/

    • Related Report
      2021 Annual Research Report 2020 Research-status Report 2019 Research-status Report
  • [Remarks] Chuan Xiaoのホームページ

    • URL

      https://sites.google.com/site/chuanxiao1983/publication

    • Related Report
      2021 Annual Research Report
  • [Remarks] Chuan XiaoのDBLPページ

    • URL

      https://dblp.org/pid/57/4384-1.html

    • Related Report
      2021 Annual Research Report
  • [Remarks] Chuan Xiao's homepage

    • URL

      https://sites.google.com/site/chuanxiao1983/publication

    • Related Report
      2020 Research-status Report 2019 Research-status Report

URL: 

Published: 2019-04-18   Modified: 2023-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi