• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2019 Fiscal Year Research-status Report

Efficient Query Processing for Learning-based Data Management

Research Project

Project/Area Number 19K11979
Research InstitutionOsaka University

Principal Investigator

肖 川  大阪大学, 情報科学研究科, 特任准教授(常勤) (10643900)

Project Period (FY) 2019-04-01 – 2022-03-31
Keywordsquery processing / ML + DB
Outline of Annual Research Achievements

There were two major achievements in FY2019. First, we developed efficient query processing methods for embeddings. We focused on dense high-dimensional data that have been widely used in important real-world applications. We took advantage of our discoveries in the pilot studies to reach a solution that works for binary high-dimensional vectors and efficiently returns answers for similarity search and join queries with Hamming distance constraints. Our experiment results showed very promising query processing performance (4 - 10 times faster than existing solutions). We published our discoveries at IEEE Transactions on Knowledge and Data Engineering (TKDE). Second, we started the study on efficient blocking techniques for queries with learning-based predicates, which can be used for entity matching. The blocking rules are a conjunction of similarity predicates generated through active learning. To efficiently apply these blocking rules, we modeled this as a query optimization problem. We developed a deep learning-based method that generates fast query plans through cardinality estimation. The proposed approach is up to one order of magnitude than the traditional method of employing sampling techniques for cardinality estimation. Our study has been accepted as a full research paper by ACM SIGMOD International Conference on Management of Data (SIGMOD) 2020. In addition, we reported a series of discoveries related to this project at premier database journals and conferences such as the VLDB Journal and IEEE International Conference on Data Engineering (ICDE) 2019.

Current Status of Research Progress
Current Status of Research Progress

1: Research has progressed more than it was originally planned.

Reason

In our plan for FY2019, we planned to finish Task 1 and develop efficient query processing methods for embeddings. We successfully reached this goal and developed a solution that works for binary high-dimensional vectors and efficiently returns answers for similarity search and join queries with Hamming distance constraints. We published our discoveries at IEEE Transactions on Knowledge and Data Engineering (TKDE), a premier journal in the database area. In addition, we made an initial attempt at Task 2 to develop generic blocking techniques for queries with learning-based predicates, which was originally planned as a target in FY2020. We modeled this task as a query optimization problem, and developed a deep learning-based method that generates fast query plans through cardinality estimation. Our study has been accepted as a full research paper by ACM SIGMOD International Conference on Management of Data (SIGMOD) 2020, a top-tier conference in the database area. We also published a few works at premier database journals and conferences such as the VLDB Journal and IEEE International Conference on Data Engineering (ICDE) 2019. Based on the above achievements in FY2019, we believe that the project has been progressing more smoothly than initially planned.

Strategy for Future Research Activity

In FY2020, we will report our discoveries and give a tutorial on similarity query processing for high-dimensional data at International Conference on Very Large Data Bases (VLDB) 2020, a top-tier conference in the database area. We will continue our investigation on Task 2 and develop generic blocking techniques for queries with learning-based predicates. This work is planned to be submitted to VLDB 2021. Another ongoing work is to extend our method developed for Task 1, so that it is able to handle efficient query processing not only on binary vectors with Hamming distance constraints but also real-valued vectors with Euclidean distance or cosine similarity constraints. We plan to submit this work to a top-tier database conference (SIGMOD 2021 or VLDB 2021). In addition, we will prepare for Task 3 and work on system prototyping and evaluation. At the end of FY2020, we are going to start the implementation of a prototype system that integrates all our proposed methods in this research period. The system design will be carried out on Apache Spark or Amazon Web Services for distributed query processing on very large datasets.

Causes of Carryover

In FY2019, the funding was mainly used for registering and attending academic conferences to report our discoveries. Due to the COVID-19 outbreak, the Forum on Data Engineering and Information Management (DEIM) 2020 was canceled at the predetermined conference venue (Bandaiatami, Fukushima) and held as online meetings in March. Therefore, the PI was unable to attend the onsite forum and this resulted in the 85,243 yen unused amount, which was supposed to be the travel expense. The PI requests this amount to be carried forward to FY2020, during which period registration for conferences, publication at journals, and purchase of equipment may occur.

  • Research Products

    (17 results)

All 2020 2019 Other

All Int'l Joint Research (2 results) Journal Article (5 results) (of which Int'l Joint Research: 3 results,  Peer Reviewed: 5 results,  Open Access: 5 results) Presentation (7 results) (of which Int'l Joint Research: 4 results) Remarks (3 results)

  • [Int'l Joint Research] ニューサウスウェールズ大学/メルボルン大学(オーストラリア)

    • Country Name
      AUSTRALIA
    • Counterpart Institution
      ニューサウスウェールズ大学/メルボルン大学
  • [Int'l Joint Research] 香港科技大学/北京理工大学/深セン計算科学研究院(中国)

    • Country Name
      CHINA
    • Counterpart Institution
      香港科技大学/北京理工大学/深セン計算科学研究院
    • # of Other Institutions
      1
  • [Journal Article] Efficient Query Autocompletion with Edit Distance-based Error Tolerance2020

    • Author(s)
      Jianbin Qin, Chuan Xiao, Sheng Hu, Jie Zhang, Wei Wang, Yoshiharu Ishikawa, Koji Tsuda, Kunihiko Sadakane
    • Journal Title

      The VLDB Journal

      Volume: - Pages: -

    • DOI

      doi.org/10.1007/s00778-019-00595-4

    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Generalizing the Pigeonhole Principle for Similarity Search in Hamming Space2020

    • Author(s)
      Jianbin Qin, Chuan Xiao, Yaoshu Wang, Wei Wang, Xuemin Lin, Yoshiharu Ishikawa, Guoren Wang
    • Journal Title

      IEEE Transactions on Knowledge and Data Engineering

      Volume: - Pages: -

    • DOI

      10.1109/TKDE.2019.2899597

    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] 道路ネットワーク上の軌跡データに対する圧縮索引2020

    • Author(s)
      小出 智士, 肖 川, 石川 佳治
    • Journal Title

      電子情報通信学会論文誌 D

      Volume: J103-D Pages: 393-402

    • DOI

      10.14923/transinfj.2019DET0001

    • Peer Reviewed / Open Access
  • [Journal Article] Scope-aware Code Completion with Discriminative Modeling2019

    • Author(s)
      Sheng Hu, Chuan Xiao, Yoshiharu Ishikawa
    • Journal Title

      IPSJ Journal of Information Processing

      Volume: 27 Pages: 469-478

    • DOI

      10.2197/ipsjjip.27.469

    • Peer Reviewed / Open Access
  • [Journal Article] Building Hierarchical Spatial Histograms for Exploratory Analysis in Array DBMS2019

    • Author(s)
      Jing Zhao, Yoshiharu Ishikawa, Lei Chen, Chuan Xiao, Kento Sugiura
    • Journal Title

      IEICE Transactions on Information and Systems

      Volume: E102-D Pages: 788-799

    • DOI

      10.1587/transinf.2018DAP0020

    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Presentation] Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning Approach2020

    • Author(s)
      Yaoshu Wang, Chuan Xiao, Jianbin Qin, Xin Cao, Yifang Sun, Wei Wang, and Makoto Onizuka
    • Organizer
      ACM SIGMOD International Conference on Management of Data (SIGMOD 2020)
    • Int'l Joint Research
  • [Presentation] P2P型データ統合アーキテクチャにおけるチケットベース手法を用いた分散トランザクション制御2020

    • Author(s)
      三宅 康太, 涌田 悠佑, 佐々木 勇和, 肖 川, 鬼塚 真
    • Organizer
      第12回データ工学と情報マネジメントに関するフォーラム (DEIM 2020)
  • [Presentation] トライ木及びGMMに基づく略語のフルネームのスケーラブルな推測手法2020

    • Author(s)
      高 明敏, 肖 川, 石川 佳治
    • Organizer
      第12回データ工学と情報マネジメントに関するフォーラム (DEIM 2020)
  • [Presentation] 多様化軌跡を効率検索するための統合クエリパラダイム2020

    • Author(s)
      胡 晟, 馬 強, 肖 川
    • Organizer
      第12回データ工学と情報マネジメントに関するフォーラム (DEIM 2020)
  • [Presentation] Distributed Transaction Management for P2P-based Update Propagation2019

    • Author(s)
      Makoto Onizuka, Yusuke Wakuta, Yuya Sasaki, Chuan Xiao
    • Organizer
      The 3rd Workshop on Software Foundations for Data Interoperability (SFDI 2019)
    • Int'l Joint Research
  • [Presentation] Autocompletion for Prefix-Abbreviated Input2019

    • Author(s)
      Sheng Hu, Chuan Xiao, Jianbin Qin, Yoshiharu Ishikawa, Qiang Ma
    • Organizer
      ACM SIGMOD International Conference on Management of Data (SIGMOD 2019)
    • Int'l Joint Research
  • [Presentation] Dynamic Set kNN Self-Join2019

    • Author(s)
      Daichi Amagata, Takahiro Hara, Chuan Xiao
    • Organizer
      The 35th IEEE International Conference on Data Engineering (ICDE 2019)
    • Int'l Joint Research
  • [Remarks] 大阪大学 ビッグデータ工学講座 鬼塚研究室

    • URL

      http://www-bigdata.ist.osaka-u.ac.jp/ja/paper/

  • [Remarks] 名古屋大学 情報学研究科 データベース研究室(石川研究室)

    • URL

      https://www.db.is.i.nagoya-u.ac.jp/ja/research/publications/

  • [Remarks] Chuan Xiao's homepage

    • URL

      https://sites.google.com/site/chuanxiao1983/publication

URL: 

Published: 2021-01-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi